Closed Bug 1388409 (jaws-jank) Opened 7 years ago Closed 6 years ago

Long jank issues with JAWS and Nightly on text heavy pages

Categories

(Core :: Disability Access APIs, defect, P2)

Product:

Component:

Type:

defect

Priority:

P2

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox57

---

affected

People

(Reporter: jimm, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: aes+)

Attachments

(3 files)

Windows 10 x64-2017-08-08-10-25-01.jpg 7 years ago Jim Mathies [:jimm] 147.23 KB, image/jpeg		Details
mainthreadstack.txt 7 years ago Jim Mathies [:jimm] 4.67 KB, text/plain		Details
InkedScreenshot-2017-8-15 perf html_LI.jpg 7 years ago David Bolter [:davidb] (NeedInfo me for attention) 1.61 MB, image/jpeg		Details

Jim Mathies [:jimm]

Reporter

Description

•

7 years ago

STR: 1) open jaws and then nightly 2) in the one open tab visit the westworld wikipedia page https://en.wikipedia.org/wiki/Westworld_(TV_series) 3) wait for the page to become responsive In vmware / win10 on my workstation this period of time is around ten minutes. Occasionally the browser would simply exit without an error. No reports visible in about:crashes when this happens.

Jim Mathies [:jimm]

Reporter

Comment 1

•

7 years ago

Attached image Windows 10 x64-2017-08-08-10-25-01.jpg — Details

Marco Zehe (:MarcoZ)

Comment 2

•

7 years ago

By comparison, NVDA takes under 1 second for this article with E10S on and on my regular Windows 10 (without a VM). So I suspect even in a VM NVDA will be a lot faster.

alexander :surkov (:asurkov)

Comment 3

•

7 years ago

let's go with P2 as it's JAWS specific (I'm sure we can do it better on our side, but we have to sort out generic perf problems at first)

Priority: -- → P2

Jim Mathies [:jimm]

Reporter

Updated

•

7 years ago

Blocks: 1354238

Jim Mathies [:jimm]

Reporter

Comment 5

•

7 years ago

Here's a profile with a 642 ms text change events: https://perf-html.io/public/ac8fb10c13c8c1912d61c457a9423ac7f264e0b8/calltree/?hiddenThreads=&range=70.5309_71.6750&thread=0&threadOrder=0-2-3-4-1-5 We spend time in a few places all of which are tied to the jaws in-proc dll doing com stuff. The most notable seems to be 72 msec spend in qi'ing a remote interface.

Flags: needinfo?(surkov.alexander)

Tracy Walker [:tracy]

Comment 6

•

7 years ago

Here' a profiler of the westworld wikipedia page with Jaws, Nightly 57.0a1 on Win 10 64bit https://perfht.ml/2uxpOb0

alexander :surkov (:asurkov)

Comment 7

•

7 years ago

(In reply to Jim Mathies [:jimm] from comment #5) > Here's a profile with a 642 ms text change events: > https://perf-html.io/public/ac8fb10c13c8c1912d61c457a9423ac7f264e0b8/ > calltree/?hiddenThreads=&range=70.5309_71.6750&thread=0&threadOrder=0-2-3-4- > 1-5 Is it accurate URL? I don't see 642ms number. I see that 98ms (16%) is spent on PDocAccessible::Msg_ShowEvent and 34% (212ms) is spent by fsDOMSrv.dll. Could you give me please some instructions how to read the profile properly?

Tracy Walker [:tracy]

Comment 8

•

7 years ago

link should be https://en.wikipedia.org/wiki/Westworld_(TV_series)

Tracy Walker [:tracy]

Comment 9

•

7 years ago

hmmm, you'll have to follow the correct link from within wikipedia suggestion.

Tracy Walker [:tracy]

Comment 10

•

7 years ago

profile while in heavy hang at westworld wikipedia page. https://perfht.ml/2vXeRCX

Jim Mathies [:jimm]

Reporter

Comment 11

•

7 years ago

(In reply to Tracy Walker [:tracy] from comment #10) > profile while in heavy hang at westworld wikipedia page. > https://perfht.ml/2vXeRCX Looks like the browser's stuck in a long PDocAccessible::Msg_SyncTextChangeEvent from content.

Jim Mathies [:jimm]

Reporter

Updated

•

7 years ago

Alias: jaws-jank

Comment hidden (offtopic)

Jim Mathies [:jimm]

Reporter

Updated

•

7 years ago

Group: mozilla-employee-confidential

Jim Mathies [:jimm]

Reporter

Comment 13

•

7 years ago

Attached file mainthreadstack.txt — Details

(In reply to Jim Mathies [:jimm] from comment #12) > Here's a dump during a bad hang while loading the westworld page. (taken > from a vm) > > http://mathies.com/mozilla/firefox.zip main thread stack for this hang.

alexander :surkov (:asurkov)

Comment 14

•

7 years ago

(In reply to Jim Mathies [:jimm] from comment #11) > (In reply to Tracy Walker [:tracy] from comment #10) > > profile while in heavy hang at westworld wikipedia page. > > https://perfht.ml/2vXeRCX > > Looks like the browser's stuck in a long > PDocAccessible::Msg_SyncTextChangeEvent from content. What do you mean by long? Is it different than large number of events? It's interesting that on a bottom of the stack is NSWaitForMultipleObjects taking about 100% of time, and there's no any evidence that text events generation itself takes anything, all I can see is the time is spent on events delivering from content to a parent process. Is this observation correct?

Flags: needinfo?(surkov.alexander)

Comment 15

•

7 years ago

Adding NeedInfo as reminder to add perf profiles here.

Flags: needinfo?(gwimberly)

(No longer employed by Mozilla) Aaron Klotz

Comment 16

•

7 years ago

This profile surface a very interesting idea. Some of this requires some history, so please bear with me: My first cut of a11y+e10s used a synchronous message, sent immediately after SendPDocAccessibleConstructor, to move the top-level doc's COM proxy over to the parent. We hit all kind of crashing with that scheme, so instead I figured that maybe we should move the COM proxy stuff into SendPDocAccessibleConstructor. This creates another problem, however, because SendPDocAccessibleConstructor is async. And we don't do any preprocessing on IPDL to allow us to make it sync on Windows only. Based on some suggestions that were brought up in the a11y meeting, I modified our code to do the following: * Content calls SendPDocAccessibleConstructor. When the parent is done processing that, it calls SendCOMProxy which is an async message sent to content that says, "Hey, I'm done with that constructor and here's the COM proxy of the top-level doc's parent" * Between the time that Content calls SendPDocAccessibleConstructor and the time that RecvCOMProxy fires, that PDocAccessible enqueues any events that are supposed to fire on it. * Once RecvCOMProxy fires, the PDocAccessible then fires all of those events that it had accumulated. I'm thinking that this scheme is hurting the ability of a11y code to properly coalesce events, since this none of this enqueuing in the IPDL actors is visible to it. Now that so many crashes have been fixed by other bugs, I think it is worth revisiting the sync SendCOMProxy message. That might fix all of our event problems for free. I can throw together a test patch + try build and we can see how that goes.

alexander :surkov (:asurkov)

Comment 17

•

7 years ago

(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #16) > I'm thinking that this scheme is hurting the ability of a11y code to > properly coalesce events, since this none of this enqueuing in the IPDL > actors is visible to it. Can you elaborate on this? I believe the event coalescence happens before we send any events over ipc, no?

Flags: needinfo?(aklotz)

Jim Mathies [:jimm]

Reporter

Comment 18

•

7 years ago

(In reply to alexander :surkov from comment #14) > (In reply to Jim Mathies [:jimm] from comment #11) > > (In reply to Tracy Walker [:tracy] from comment #10) > > > profile while in heavy hang at westworld wikipedia page. > > > https://perfht.ml/2vXeRCX > > > > Looks like the browser's stuck in a long > > PDocAccessible::Msg_SyncTextChangeEvent from content. > > What do you mean by long? Is it different than large number of events? > > It's interesting that on a bottom of the stack is NSWaitForMultipleObjects > taking about 100% of time, and there's no any evidence that text events > generation itself takes anything, all I can see is the time is spent on > events delivering from content to a parent process. Is this observation > correct? Content 1 of 2 is sitting in an ipc wait after calling SendSyncTextChangeEvent. That's pretty much all this thread does for the entire profile. The parent is missing data but there is some on the end.

Jim Mathies [:jimm]

Reporter

Comment 19

•

7 years ago

(In reply to Jim Mathies [:jimm] from comment #18) > (In reply to alexander :surkov from comment #14) > > (In reply to Jim Mathies [:jimm] from comment #11) > > > (In reply to Tracy Walker [:tracy] from comment #10) > > > > profile while in heavy hang at westworld wikipedia page. > > > > https://perfht.ml/2vXeRCX > > > > > > Looks like the browser's stuck in a long > > > PDocAccessible::Msg_SyncTextChangeEvent from content. > > > > What do you mean by long? Is it different than large number of events? > > > > It's interesting that on a bottom of the stack is NSWaitForMultipleObjects > > taking about 100% of time, and there's no any evidence that text events > > generation itself takes anything, all I can see is the time is spent on > > events delivering from content to a parent process. Is this observation > > correct? > > Content 1 of 2 is sitting in an ipc wait after calling > SendSyncTextChangeEvent. That's pretty much all this thread does for the > entire profile. The parent is missing data but there is some on the end. I see a lot of calls by the JAWS module to AccessibleHandler::get_relations. Aaron, is that something we can optimize?

(No longer employed by Mozilla) Aaron Klotz

Comment 20

•

7 years ago

(In reply to Jim Mathies [:jimm] from comment #19) > I see a lot of calls by the JAWS module to AccessibleHandler::get_relations. > Aaron, is that something we can optimize? The main work that we did there was bug 1384328. My understanding is that there are newer APIs that JAWS could switch to on their end that might help. Alex, what is the replacement for get_relations?

Flags: needinfo?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 21

•

7 years ago

relationTargetsOfType [1], which may be a perf relief if JAWS doesn't need to grab all relations at once, or if IAccessibleRelation interface is costy by itself to expose through COM jungle. If you see that they call a lot of get_relations, then it could be that they want to grab all possible relations for a whole document, in which case, the method might be not that helpful as I anticipated. If that's a case, then we probably should reach them asking if they can do something else. [1] http://git.linuxfoundation.org/?p=a11y/ia2.git;a=blob;f=api/Accessible2_2.idl;h=e90c2a348d48828488ce7e657d82ee446c598f7f;hb=HEAD#l114

Flags: needinfo?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 22

•

7 years ago

(In reply to Jim Mathies [:jimm] from comment #18) > Content 1 of 2 is sitting in an ipc wait after calling > SendSyncTextChangeEvent. That's pretty much all this thread does for the > entire profile. The parent is missing data but there is some on the end. So, it is waiting for a parent process to answer? Btw, I see same picture for 'content 2 of 2' and GeckoMain in a pref profile from comment #10. What is a bottleneck then there? Is the inter-process communication a thin place? (In reply to Jim Mathies [:jimm] from comment #19) > I see a lot of calls by the JAWS module to AccessibleHandler::get_relations. > Aaron, is that something we can optimize? Which profile do you look at? Do you see a lot of get_relations calls while we deliver text change events? Are there any show/hide events in a profile? I'm asking because it feels weird if JAWS requests relations on text change events.

David Bolter [:davidb] (NeedInfo me for attention)

Comment 23

•

7 years ago

Attached image InkedScreenshot-2017-8-15 perf html_LI.jpg — Details

(In reply to Jim Mathies [:jimm] from comment #5) > Here's a profile with a 642 ms text change events: > https://perf-html.io/public/ac8fb10c13c8c1912d61c457a9423ac7f264e0b8/ > calltree/?hiddenThreads=&range=70.5309_71.6750&thread=0&threadOrder=0-2-3-4- > 1-5 > > We spend time in a few places all of which are tied to the jaws in-proc dll > doing com stuff. The most notable seems to be 72 msec spend in qi'ing a > remote interface. Hi Jim are you looking here (see attachment)... when I hover over the SendSyncTextChangeEvent I see 610ms...

Flags: needinfo?(jmathies)

David Bolter [:davidb] (NeedInfo me for attention)

Comment 24

•

7 years ago

Ah we see a 624ms (642 was probably a typo) activity marker in content process 1. Cancelling NI.

Flags: needinfo?(jmathies)

alexander :surkov (:asurkov)

Comment 25

•

7 years ago

(In reply to alexander :surkov from comment #22) > (In reply to Jim Mathies [:jimm] from comment #19) > > > I see a lot of calls by the JAWS module to AccessibleHandler::get_relations. > > Aaron, is that something we can optimize? > > Which profile do you look at? > > Do you see a lot of get_relations calls while we deliver text change events? > Are there any show/hide events in a profile? I'm asking because it feels > weird if JAWS requests relations on text change events. Thanks to Markus, demoing some profiling tricks. There's indeed a spike on get_relations [1] but those are called by JAWS. My guess will be we are bombing them with events, and that's how they answer it. It'd be interesting to reach them for their insight on the issue. [1] https://perfht.ml/2w7IsdZ

(No longer employed by Mozilla) Aaron Klotz

Comment 26

•

7 years ago

> (In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, > please send interceptor reviews to dmajor or handyman) from comment #16) > > > I'm thinking that this scheme is hurting the ability of a11y code to > > properly coalesce events, since this none of this enqueuing in the IPDL > > actors is visible to it. > > Can you elaborate on this? I believe the event coalescence happens before we > send any events over ipc, no? From what I am seeing, with 100% async messaging, no events are coalesced. When we force content to wait for acknowledgement from the parent that the COM proxies have been set up, then the events do become coalesced (see below). (In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #16) > I can throw together a test patch + try build and we can see how that goes. I have seen some positive results with NVDA on Marco's WWI wikipedia page. The pause is gone there. Trying JAWS now but I am having some deadlock issues.(In reply to alexander :surkov from comment #17)

Flags: needinfo?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Comment 27

•

7 years ago

(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #26) > I have seen some positive results with NVDA on Marco's WWI wikipedia page. > The pause is gone there. > > Trying JAWS now but I am having some deadlock issues. Upon further testing, I take that back. :-(

Comment 28

•

7 years ago

Got a perf from a hang during e10s/a11y testing with the live region: http://bit.ly/2xvaCIu

Flags: needinfo?(gwimberly)

Comment 29

•

7 years ago

Correction to Comment 28: http://bit.ly/2xvACIu

Marco Zehe (:MarcoZ)

Comment 30

•

7 years ago

I just realized that this isn't in our aes list, so putting it there. This is essentially what I am seeing, too.

Whiteboard: aes+

alexander :surkov (:asurkov)

Comment 31

•

7 years ago

(In reply to Aaron Klotz [:aklotz]) > > Can you elaborate on this? I believe the event coalescence happens before we > > send any events over ipc, no? > > From what I am seeing, with 100% async messaging, no events are coalesced. > When we force content to wait for acknowledgement from the parent that the > COM proxies have been set up, then the events do become coalesced (see > below). how exactly you do that ('force content to wait for acknowledgement from the parent')?

Jim Mathies [:jimm]

Reporter

Comment 32

•

7 years ago

removed the crash dump file linked here. opening this back up.

Group: mozilla-employee-confidential

Comment 33

•

7 years ago

Getting long hangs while doing JAWS testing for Pre-Beta: This was a 6-7 minute hang when doing ARIA testing- https://perfht.ml/2f1054o

Jim Mathies [:jimm]

Reporter

Comment 34

•

7 years ago

STR: 1) run JAWS 2) load test case http://oaa-accessibility.org/example/2/ 3) as soon as the page loads, hit ctrl-L 4) wait for the address bar to take focus 5) collect profile https://perfht.ml/2jwsJPT Address bar focus delay was about 30-45 seconds here on a surface pro 4 tablet.

Jim Mathies [:jimm]

Reporter

Comment 35

•

7 years ago

STR: 1) run JAWS 2) load test case http://oaa-accessibility.org/example/4/ 3) as soon as the page loads, hit ctrl-L 4) wait for the address bar to take focus 5) collect profile https://perfht.ml/2jx4kJZ 28 second page load hang

Jim Mathies [:jimm]

Reporter

Comment 36

•

7 years ago

STR: 1) open JAWS then nightly 2) search bing.com for 'westworld' 3) when search results display press ctrl-L 4) wait for the address bar to take focus 5) collect profile https://perfht.ml/2jvLPG3 This collection got a little weird at the end: the browser was still bogged down somewhat after keyboard input returned. In the task manager the tab associated with bing was consuming about 20% cpu with the parent doing the same. I shutdown JAWS and waited but the problem persisted. Finally I terminated the bing content process in the task manager, and then was finally able to ctrl-shift-2 and post this profile.

(No longer employed by Mozilla) Aaron Klotz

Comment 37

•

7 years ago

The profile in comment 36 is interesting because content process 1's message manager is sending a sync message. Then in content process 3, we also end up sending a sync message via PDocAccessible::SendSyncTextChangeEvent. But it looks to me like JAWS is running on the browser's main thread via the app shell event. So content process 1's sync message is held up while JAWS is doing its thing on the main thread.

Jim Mathies [:jimm]

Reporter

Updated

•

7 years ago

Blocks: 1400434

David Bolter [:davidb] (NeedInfo me for attention)

Comment 40

•

7 years ago

Does this require a Jaws fix?

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Reporter

Comment 41

•

7 years ago

(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #40) > Does this require a Jaws fix? I think it likely requires improvements on both sides.

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Reporter

Comment 42

•

7 years ago

Just tested 2018 on one of our reference laptops. I still see hangs on first page load on some pages. I'm not running the latest jaws though, updating now and will collect a profile to see how things look currently in nightly and beta.

Jim Mathies [:jimm]

Reporter

Comment 43

•

7 years ago

(In reply to Jim Mathies [:jimm] from comment #42) > Just tested 2018 on one of our reference laptops. I still see hangs on first > page load on some pages. I'm not running the latest jaws though, updating > now and will collect a profile to see how things look currently in nightly > and beta. Latest releases of JAWS 2018 and Firefox nightly: http://perfht.ml/2gPzTux On our low end reference hardware this combination is pretty much unusable.

Jim Mathies [:jimm]

Reporter

Comment 44

•

7 years ago

Here's a related profile for latest NVDA 2017.3 on the same low end hardware - https://perfht.ml/2iittEE Generally usable browser, minor page load jank on WWI but overall no major problems.

Comment 45

•

6 years ago

some recent jaws updates brought perf improvements: https://www.freedomscientific.com/downloads/JAWS/JAWSWhatsNew#Enhancements jim, can you still reproduce with the steps in comment #0?

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Reporter

Comment 46

•

6 years ago

I think it's safe to close this out now. We've done a ton of perf work for JAWS, and continue to. Freedom Scientific continue to improve perf on their end as well.

Status: NEW → RESOLVED

Closed: 6 years ago

Flags: needinfo?(jmathies)

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.