Closed Bug 1330793 Opened 7 years ago Closed 7 years ago

Google docs regularly hangs when selecting text

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

x86_64
Windows
defect

Tracking

()

VERIFIED FIXED
mozilla53
Tracking Status
firefox53 + verified

People

(Reporter: jimm, Assigned: jandem)

Details

(Whiteboard: [profiler-example])

Attachments

(2 files, 1 obsolete file)

Attached file js stack.txt
I've been running into content process hangs when selecting text in google docs. For testing purposes I have all extensions disabled. The js stack I get when breaking out in the debugger is attached. Also, have an active profiler session running fixes the problem.
I may have filed this in the wrong component, fwiw, it could be a google site bug. I'm still trying to figure out what this js is trying to do including looking at sync networking, which showed up in at least one profile I generated using the old version of cleopatra. 

The hung web page notification successfully comes up in response to this hang and if I chose Stop It, the page recovers, so our content process js hang handling is working. 

Sometimes this manifests as an in-content google doc error prompt, which instructs the user to reload the page. Next time I get that I'll post a screenshot.
Component: JavaScript Engine → General
Product: Core → Firefox
[Tracking Requested - why for this release]:
Two users have reported this. It makes google docs unusable.
Timing out via the s,low page notification bar helps confirm the stack I've posted - 

13:24:35.109Error: Script terminated by timeout at:
zVa@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:994:18
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1392:289
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1187:242
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1397:97
RF.prototype.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1402:236
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1187:242
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1197:147
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1187:242
Mdb@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1500:54
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1499:415
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1520:432
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1567:219
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1576:112
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1583:244
y.Pf@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1608:210
Ufb@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1595:28
Sfb@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1594:160
mgb@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1607:189
y.Gvb@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1610:251
Izb@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:2073:254
Qv/<@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:2073:87
y.wC@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:740:345
y.dispatchEvent@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:739:395
FLa@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:746:276
y.Ee@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:742:1182
y.Ee@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1451:1136
nGa@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:1910:354
iGa@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:575:252
y.$0a@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:575:133
nua@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:225:240
jua@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:226:494
c@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:2059:84
hua/b<@https://docs.google.com/static/document/client/js/26392045-kix_main_i18n_kix_core.js:223:51
126392045-kix_main_i18n_kix_core.js:994:18
Unable to reproduce this in dev addition.
(In reply to Jim Mathies [:jimm] from comment #5)
> Unable to reproduce this in dev addition.

I can reproduce in fresh profile in 53.
Markus, any change your thread work here might have triggered an issue in google docs?
Flags: needinfo?(mstange)
OS: Unspecified → Windows
Hardware: Unspecified → x86_64
(In reply to Jim Mathies [:jimm] from comment #7)
> short blame list:
> 
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=6f63f95e28ffc05c0d2f5ef6cd6e05905fe8ea5a&tochange=21dc
> 2d95071cb7712af8552e3880ec325b1069f1
> 

Looks like bug 1323100 landed and bounced a couple of times, as such it never made it to mc until 1/7. So it doesn't appear to be at fault.

2016-12-29 16:24:58 PST  mi landing
2016-12-29 16:29:13 PST  backed out
2016-12-30 14:15:13 PST  mi landing
2016-12-30 16:09:41 PST  backed out
2017-01-06 04:42:50 PST  mi landing
2017-01-07 08:27:35 PST  merged to mc
Flags: needinfo?(mstange)
Tracking 53+ for this issue as I have also heard a few folks on IRC mention running into this issue.
Reran mozregression over a much wider range since my first was pretty tight. This new run is from 12-30 through 1-10. This reproduced 1-10, 1-04, 1-01, and 12-31. New more accurate range. (tip: never start with a tight range in mozregression!)

wide:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=143bb4b9249e528e658f6ccc449991794b8675f8&tochange=6f63f95e28ffc05c0d2f5ef6cd6e05905fe8ea5a

tighter:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=db38dbe32679ad449fecadf6e5f8501d6a0d7234&tochange=05a6f5bcb7a086985dac27c3db82324326434459

cause according to mozregression:
Bug 1323096 - CacheIR: optimize more value[double] cases

This was a jit opt patch, which fits.
Flags: needinfo?(jdemooij)
Component: General → JavaScript Engine: JIT
Product: Firefox → Core
Since you can reproduce this reliably enough to bisect, what are the STR exactly? Is there a particular document you see this on? How much text do you select? What's the platform? Does it hang immediately or do you have to try it a number of times? How many?
Flags: needinfo?(jmathies)
(In reply to Jan de Mooij [:jandem] from comment #12)
> Since you can reproduce this reliably enough to bisect, what are the STR
> exactly? Is there a particular document you see this on? How much text do
> you select? What's the platform? Does it hang immediately or do you have to
> try it a number of times? How many?

A large google doc that has people viewing it works the best. 

STR:

1) open the doc
2) position the cursor near a block of text
3) wait until some of the viewers start popping into view and their cursor positions start showing up
4) click down and slowly drag to select a block of text

If this doesn't freeze up the content process immediately, continue to experiment with selecting blocks of text by slowly expanding and contracting the selection. I usually get a lock up within the first few tries.

FYI, blassey has been running into this on irccloud as well.
Flags: needinfo?(jmathies)
Reproduced on Win7 and Win10.
Also, one alternate manifestation here is a Google Docs error that tells you to reload the page. The content process doesn't lock up in these cases. This happens less often that the freeze.
I've been trying to repro for at least 30 minutes with Nightly builds on OS X and Win64 but no luck so far.

My test document has 54 pages and I opened it in 4 different browsers. Is one of your documents public?
I just tried with a 32-bit build and I got the "reload this document" message. Investigating...
Attached patch Patch (obsolete) — Splinter Review
The problem here seems to be the thing we do for GuardIsInt32 where we can replace the input ValueOperand with an int32 Value. This is fine for Baseline, but for Ion things are more complicated and we just shouldn't modify our IC inputs.

This patch adds GuardIsInt32Index, with an int32 output register, and changes GuardIsInt32 to do just the is-int32 guard.

evilpie, I think this will also make it easier to optimize the -0.0 case?
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8827205 - Flags: review?(evilpies)
Attached patch PatchSplinter Review
Attachment #8827205 - Attachment is obsolete: true
Attachment #8827205 - Flags: review?(evilpies)
Attachment #8827206 - Flags: review?(evilpies)
Priority: -- → P1
Comment on attachment 8827206 [details] [diff] [review]
Patch

Review of attachment 8827206 [details] [diff] [review]:
-----------------------------------------------------------------

Yes indeed, this makes allowing -0.0 a one line change. I think the new name makes it obvious enough that this function is for index handling and nothing else.

::: js/src/jit/CacheIR.cpp
@@ +1505,5 @@
>          if (indexSigned < 0)
>              return false;
>  
>          *int32Index = uint32_t(indexSigned);
> +        *int32IndexId = writer.guardIsInt32Index(indexId);

We also use this function in tryAttachStringChar and tryAttachMagicArgument, it should be easy to convert those to maybeGuardInt32Index as well.

::: js/src/jit/CacheIRCompiler.h
@@ +17,5 @@
>  #define CACHE_IR_SHARED_OPS(_)            \
>      _(GuardIsObject)                      \
>      _(GuardIsString)                      \
>      _(GuardIsSymbol)                      \
>      _(GuardIsInt32)                       \

We probably don't need this anymore after the other change.
Attachment #8827206 - Flags: review?(evilpies) → review+
(In reply to Tom Schuster [:evilpie] from comment #20)
> We also use this function in tryAttachStringChar and tryAttachMagicArgument,
> it should be easy to convert those to maybeGuardInt32Index as well.

Actually not very easy because there we want to do all checks first and then emit IR. Using maybeGuardInt32Index makes that a little annoying atm. I changed it to emit GuardIsInt32Index though and removed GuardIsInt32.
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f4966566dfcf
Refactor int32 index guard to not modify the input register. r=evilpie
https://hg.mozilla.org/mozilla-central/rev/f4966566dfcf
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Can you confirm this is fixed now?
Flags: needinfo?(jmathies)
can't reproduce in current nightly.
Flags: needinfo?(jmathies)
Jan - do you know why this would not reproduce with the profiler running? (How much does JIT code change when SPS is active?) And if this *were* reproducible with the profiler running, is there a symptom that the profiler could have reported that would make it easier to track down? (eg, would this have manifested as repeated bailouts at the same location?)
Flags: needinfo?(jdemooij)
Whiteboard: [profiler-example]
(In reply to Steve Fink [:sfink] [:s:] from comment #26)
> Jan - do you know why this would not reproduce with the profiler running?
> (How much does JIT code change when SPS is active?)

Maybe because Ion's register allocator doesn't use the frame pointer register when profiling is enabled:

http://searchfox.org/mozilla-central/rev/30fcf167af036aeddf322de44a2fadd370acfd2f/js/src/jit/RegisterAllocator.h#295-296

This only reproduced on x86 (not x64) for me and we don't have many registers available on x86, so having one less could definitely affect regalloc-related stuff.

> And if this *were*
> reproducible with the profiler running, is there a symptom that the profiler
> could have reported that would make it easier to track down? (eg, would this
> have manifested as repeated bailouts at the same location?)

It's hard to say. It was more a correctness bug than a performance bug (that's why the website sometimes showed that "reload the page" message - they detected something went wrong, maybe they got an exception somewhere) and a bug like that can manifest itself in a lot of ways.
Flags: needinfo?(jdemooij)
Flags: qe-verify+
I've managed to reproduce this issue with an old Nightly build from 2017-01-12 using Win 10 x86.

I can no longer reproduce the hangs nor the google docs error mentioned in comment 15, on 53 beta 2 (20170313224914) under Windows 10 x64/x86, Ubuntu 16.04 x64 LTS and Mac OS X 10.11.6.

Marking this as verified fixed.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: