Open Bug 1314912 Opened 3 years ago Updated 4 months ago

history.pushState - Firefox Hangs and then iterating script error

Categories

(Core :: DOM: Navigation, defect, P2)

49 Branch
x86
Windows
defect

Tracking

()

People

(Reporter: sachin.raste, Unassigned)

References

(Blocks 1 open bug)

Details

(4 keywords)

Attachments

(4 files)

Attached file index.html
User Agent: Mozilla/5.0 (Windows NT 5.2; rv:49.0) Gecko/20100101 Firefox/49.0
Build ID: 20161019084923

Steps to reproduce:

Opened up the attached html file which contains 

     jQuery(window).load(function(){
                    var total = "";
                    for(var i=0; i<100000; i++){
                        total = total + i.toString();
                        history.pushState(0, 0, total);
                    } });




Actual results:

Firefox Hangs , CPU usage crosses 50% , after some time, get script error .

Havent checked this issue on my phone / tablet, however I believe it should. 


Expected results:

Redirect / Stop gracefully
OS: Unspecified → Windows
Hardware: Unspecified → x86
OS : Windows 2003
WFM in 52.0a1 (2016-11-02) (32-bit) on Win10 x64.
Component: Untriaged → Document Navigation
Product: Firefox → Core
Priority: -- → P3
Nika, can you see if this is easy to fix?
Flags: needinfo?(nika)
Priority: P3 → P2
Couple of profiles that may or may not be useful

STR: Take the script from comment 0, edit the iterations to a smaller number (i.e. 1000 and 3000), run in scratchpad

Here's a loop of 1000 times in a profile https://perfht.ml/2BrMBrF - the 3 jumps on cp4 is the script running. This was just minor hangs, browser still basically usable.

And then https://perfht.ml/2BsHibB which is 3000 loops and locked up the browser for 15 seconds or so
This looks like it was fixed by Chromium in
https://bugs.chromium.org/p/chromium/issues/detail?id=394296. It was fixed by
adding a rate limit to calls to pushState and replaceState.

I figure we might as well do the same thing - this patch adds a similar rate
limit to pushState and replaceState in Gecko.

MozReview-Commit-ID: IThztDz1xKI
Attachment #8933783 - Flags: review?(sawang)
Assignee: nobody → nika
Flags: needinfo?(nika)
I should note that I'm not sure how I would add a reliable automated test for this - so I haven't done so.
bug 1246773 is the same issue (and has 3 other dupes). Even with multi-process this hangs the parent/browser in addition to the child. Manually killing the child process eventually led to me getting my browser back, but took a while (30-60s) before the parent became responsive again.
Duplicate of this bug: 1246773
Duplicate of this bug: 1349537
Duplicate of this bug: 1365392
Duplicate of this bug: 1370212
Please see bug 1380305 comment 22 for some concerns about this approach.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Nika, are you interested in continuing the work here? (if so I will resolve duplicated bug 1380305)

Looks like we do need to handle location.hash. The latest Chrome Canary / Safari Tech Preview are also suffering from location.hash flooding, BTW. Not sure if there're other cases we need to worry about.
(In reply to Samael Wang [:freesamael] from comment #13)
> Nika, are you interested in continuing the work here? (if so I will resolve
> duplicated bug 1380305)
> 
> Looks like we do need to handle location.hash. The latest Chrome Canary /
> Safari Tech Preview are also suffering from location.hash flooding, BTW. Not
> sure if there're other cases we need to worry about.

Your patch looks more polished than mine but I can finish this bug up if you'd like. Handling location.hash should be possible too, we can approach it in the same way.
Duplicate of this bug: 1380305
Then I will continue my work here :)
Assignee: nobody → sawang
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from bug 1380305 comment #23)
> Is this a problem for replaceState too, not just pushState?  I guess that
> would still have the IPC traffic, but not the nonstop growth in data we need
> to store in the parent process, right?
> 

There wasn't much to store in the parent process, and the session history length is limited to 50 entries so there shouldn't be much to store in the child process either. The issue was mainly that the frequency of sendAsyncMessage from the child is a few times faster than the parent can consume, which keeps the parent process busy and the chrome UI extremely non-responsive. 

Besides, it seems the parent process was too busy to schedule GC, so the memory size keeps growing. According to the DMD output in bug 1380305 comment #16 I believe it was the URI generated in RemoteWebNavigation which consumes the memory:
https://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/toolkit/modules/RemoteWebProgress.jsm#251

On my Linux machine, it can cause thrashing in a few minutes and make the whole system be non-responsive. It seems anything causes infinite OnLocationChange has the same effect. I just realized I can also reproduce it with a few `pushState` and then a simple `history.back` + `history.forward` loop. Now I'm thinking that we should probably throttle all LOCATION_CHANGE_SAME_DOCUMENT navigation.
I'm proposing a more aggressive solution that would require another review.
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from bug 1380305 comment #23)
> The algorithm in this function is bit nondeterministic from the point of
> view of a web page.  It might be able to do 98 pushstate calls in a given
> 10-second period (do one call, wait 9.5 seconds, do 48 calls, wait one
> second, do 49 calls, for a total of 97 calls in a period starting 5s after
> that first call).  Or it might not, depending on exactly how its calls align
> with the timer.  That might be OK, as long as we're very clear in our
> documentation, but seems like it could lead to pages that randomly work some
> times but not others...  Is this basically the algorithm Chrome and Safari
> are using?
> 

Yes I think that's also what Chrome & Safari are doing.

Chromium:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/frame/History.cpp?l=144-161

Webkit:
https://github.com/WebKit/webkit/blob/679b410c541bdccac78ef40c873ea497a613243a/Source/WebCore/page/History.cpp#L201-L211
Comment on attachment 8935252 [details]
Bug 1314912 - Part 1: Throttle same document navigation caused by content scripts, if there have been too many LOCATION_CHANGE_SAME_DOCUMENT caused by content scripts in a short time frame.

https://reviewboard.mozilla.org/r/206138/#review212020

::: commit-message-cb909:4
(Diff revision 1)
> +Bug 1314912 - Part 1: Throttle same document navigation caused by content scripts, if there have been too many LOCATION_CHANGE_SAME_DOCUMENT caused by content scripts in a short time frame. r?bz
> +
> +The patch records the number of successive LOCATION_CHANGE_SAME_DOCUMENT when
> +the incumbent global exists and it's not system principal, which should

What happens if the page does a bunch of click() on anchors that point to the page url, but with different refs?  That comes via OnLinkClickEvent.  Does the AutoJSAPI there give us an incumbent global?  I would expect it does not, and that situation won't be caught by this code.

Ideally, we would explicitly check whatever would become the triggering principal for the load in the load case...
Attachment #8935252 - Flags: review?(bzbarsky)
Comment on attachment 8935253 [details]
Bug 1314912 - Part 2: Add a few test cases.

https://reviewboard.mozilla.org/r/206140/#review212022

::: docshell/test/navigation/test_bug1314912_back_forward.html:43
(Diff revision 2)
> +    }
> +    for (let i = 0; i < 30; i++) {
> +      history.forward();
> +    }
> +
> +    // The next history navigation should throw.

Hmm.  So this will mean that if too many history.back() happen in a row, they could start failing?

But why should we block back() or forward() ever?  It's _adding_ things to the history that's a problem, no?

::: docshell/test/navigation/test_bug1314912_back_forward.html:47
(Diff revision 2)
> +
> +    // The next history navigation should throw.
> +    SimpleTest.doesThrow(
> +      () => history.back(), "check history.back does throw");
> +
> +    if (hasRepeated) {

!hasRepeated, I would think.

::: docshell/test/navigation/test_bug1314912_back_forward.html:48
(Diff revision 2)
> +    // The next history navigation should throw.
> +    SimpleTest.doesThrow(
> +      () => history.back(), "check history.back does throw");
> +
> +    if (hasRepeated) {
> +      // Repeat the test again after 5 seconds.

Where are we testing that once we've reached the cap other attemptes within the 3s timeout will fail too?

::: docshell/test/navigation/test_bug1314912_hash_change.html:42
(Diff revision 2)
> +    // The next hash change should throw.
> +    SimpleTest.doesThrow(
> +      () => location.hash = ref,
> +      "check setting location.hash does throw");
> +
> +    if (hasRepeated) {

!hasReated.

::: docshell/test/navigation/test_bug1314912_hash_change.html:43
(Diff revision 2)
> +    SimpleTest.doesThrow(
> +      () => location.hash = ref,
> +      "check setting location.hash does throw");
> +
> +    if (hasRepeated) {
> +      // Repeat the test again after 5 seconds.

Again, we should test the within-3-second thing, I would think.

::: docshell/test/navigation/test_bug1314912_pushState.html:42
(Diff revision 2)
> +    // The next pushState should throw.
> +    SimpleTest.doesThrow(
> +      () => history.pushState(null, "test", location.href + "#" + ref),
> +      "check history.pushState does throw");
> +
> +    if (hasRepeated) {

!hasRepeated, and test within the 3s timeout...
Attachment #8935253 - Flags: review?(bzbarsky)
Comment on attachment 8935253 [details]
Bug 1314912 - Part 2: Add a few test cases.

https://reviewboard.mozilla.org/r/206140/#review212022

> Hmm.  So this will mean that if too many history.back() happen in a row, they could start failing?
> 
> But why should we block back() or forward() ever?  It's _adding_ things to the history that's a problem, no?

Not really. Anything that causes excessive location changes can DoS the parent process and hang the chrome UI. See comment 17.

> Where are we testing that once we've reached the cap other attemptes within the 3s timeout will fail too?

I'm worrying about intermittent failures on try server if we do another `setTimeut` within 3 seconds. Bug 1362410 shows that 1 second ExpirationTracker can take more than 3 seconds to run on try server.
> I'm worrying about intermittent failures on try server if we do another `setTimeut` within 3 seconds

Sure.  But when the timer fires you can check how much time has _actually_ passed and decide whether to actually try to assert anything about the state at that point if too much time has passed.
For anyone interested in this bug, feel free to take the patch and continue the work.
Assignee: freesamael → nobody
Duplicate of this bug: 1491964
Duplicate of this bug: 1562245
See Also: → 1588509
You need to log in before you can comment on or make changes to this bug.