Open Bug 1623221 Opened 5 years ago Updated 3 years ago

Large interactive SVG graph completely deadlocks Firefox

Categories

(Core :: Disability Access APIs, defect, P3)

Unspecified
Windows
defect

Tracking

()

Tracking Status
firefox-esr68 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- fix-optional

People

(Reporter: mark, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Attached image perf.folded.svg

I was curious about Nightly's SVG performance so I attempted to load up a large interactive SVG graph (attached).
Result: the graph is drawn but Firefox completely deadlocks; UI unresponsive, content unresponsive, does not respond to close commands from the task bar either...

The only way to get out of it is by force-killing it from task manager.
Tested on Nightly 76.0a1 (2020-03-17) (64-bit)

The same file renders fine in Chrome, IE (a little slow), and Pale Moon.

Mark, thanks for reporting this. On my linux box, I don't see much differences between on Chrome and on Firefox. Could you please take a profile and put the link to the result?

Thank you!

Flags: needinfo?(mark)

https://perfht.ml/33ptPjW with "sequential styling enabled"
Almost all the time is spent in stylo.
Emilio, mind taking a look? To repro, select one graph element at the top, then select a graph element at the very bottom.

Flags: needinfo?(emilio)

I'm on Windows. I'm sorry but taking a profile is likely not going to work because the moment I load the file, Firefox locks up. I don't even have to interact with it. I load up, and the browser locks, consumes ~30% of my 6-core CPU and is unresponsive. Taskmanager-killing is the only way out.

If you know of a way to record a profile in this situation, then please do tell.

Flags: needinfo?(mark)

So that seems a different case where there's tons of stuff worth improving. In particular, that page is doing something like:

  • SVGTextElement.getSubStringLength(...)
  • element.style.display = "<something>"

And so on, once for each element. Seems we can potentially optimize away a lot of those flushes. But that still seems like a different thing from what Mark is seeing... Maybe file a separate bug for this? I have some ideas, though they're non-trivial because this code depends on the frame tree of all the descendants being around.

(In reply to Mark Straver from comment #3)

I'm on Windows. I'm sorry but taking a profile is likely not going to work because the moment I load the file, Firefox locks up. I don't even have to interact with it. I load up, and the browser locks, consumes ~30% of my 6-core CPU and is unresponsive. Taskmanager-killing is the only way out.

That is really odd, is that on a clean profile? One would think e10s would prevent that kind of lockup, but maybe this is a graphics issue...

Markus, do you know how to take a useful profile for something like the above? On linux I'd use perf or something but...

Mark, does this reproduce with/without WebRender and such? That would at least point to graphics code.

Flags: needinfo?(mstange)
Flags: needinfo?(mark)
Flags: needinfo?(emilio)

I checked, webrender is disabled by default. enabling it makes no difference. Disabling HWA in preferences also makes no difference.
The behavior when loading from a local file is that the graph renders but the browser locks up. loading it remotely the graph only renders partially before lockup.
If I specifically end the content process that is using the most CPU, it behaves as if the tab crashed and I can resume normal operation within Firefox. It doesn't seem to be graphics related, at least not directly.

Loading the SVG, there is a brief time where the overall application process is not responding to Windows (the dreaded (not responding) suffix) but that goes away again. After that, from Windows' point of view, the process isn't locked up, but it doesn't respond to any input or window operations and uses CPU constantly; occasionally, the standard windows title bar shows before being replaced with the photon interface again after a second or so. This will go on indefinitely and Firefox doesn't recover on its own.

This is an almost-pristine profile since I only use Nightly on rare occasions to check certain features or do web compatibility research. No extensions are installed. Theme is default.

Flags: needinfo?(mark)

FWIW, I tried it on my Windows laptop (thinkpad 460p, 4-core intel and Intel HD 530), it works fine.

Mark, would you mind trying to reduce the svg size to be able to take a profile? If the slowness depends on the SVG size or something like that, there should be reasonable size that is that rendering is slow but not something like a deadlock. If it doesn't depend on the size, there must be a trigger of the deadlock.

Flags: needinfo?(mark)
Blocks: 1623500

I didn't create this SVG myself and I'm not aware of what tools were used to create it. I'm more than happy to provide more info if I can, but reducing the size of the SVG is likely going to be difficult.

I could potentially try a mozregression run to see if I can find a range where it starts going wrong but that will be time-consuming with lockups and I don't have time for that today, I think, and have limited time to sink into this issue unfortunately. Let me know if that would help and I'll do my best to pinpoint a regression range.

Flags: needinfo?(mark)

I suspect Hiro meant reducing the window size? Does that have any effect?

No I meant reducing SVG elements in the file, but reducing window size might have some effects.

Reducing the window size has no effect, it will just show less of the image, but still locks up.

Okay I was able to set some time aside to run mozregression -- seems it's been an issue since nightlies of Firefox 52.
Firefox 52-ESR is, however, not affected.

INFO: Last good revision: 90d8afaddf9150853b0b68b35b30c1e54a8683e7 (2016-10-19)
INFO: First bad revision: 99a239e1866a57f987b08dad796528e4ea30e622 (2016-10-20)
INFO: Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=90d8afaddf9150853b0b68b35b30c1e54a8683e7&tochange=99a239e1866a57f987b08dad796528e4ea30e622

Possibly bug 1310117?

Attached file reduced+profile.zip

I managed to reduce the SVG by manually cutting out <g> elements. reduced to 89k it still displays the problem, but gave me -just- enough free cycles to use devtools for recording a profile. It seems the profile is completely oblivious to any problem, though.
Both attached.

Thanks for the profile and regression range!

Here is a link to the profile result: https://perfht.ml/2vyCGDh

The result looks odd, I am seeing tons of requestAnimationFrame callbacks in the profile but there is no requestAnimationFrame calls in the SVG.

bug 1310117 is not likely the cause since IIRC at that time Stylo hadn't been enabled on nightly yet.

Flags: needinfo?(mstange)

bug 1310117 is not likely the cause since IIRC at that time Stylo hadn't been enabled on nightly yet.

That would be extra odd then, since it's something that obviously survived moving most/all of this to stylo... It just stood out to me but I'll leave the analysis up to you guys.

Ok, so the requestAnimationFrame callbacks come from devtools, probably Mark took the profile via devtools. That makes me think that the content process didn't hang at all, the deadlock happened in the parent process or GPU process? I have no further idea what's going on there.

And I couldn't find any suspicious things in the regression range. Maybe bug 1310788 might be related to, maybe.

Mark, could you try with setting "accessibility.force_disabled" to true?

Anyways, I am setting P3 for now since this issue does rarely happen, I think.

Flags: needinfo?(mark)
Priority: -- → P3

Ok, so the requestAnimationFrame callbacks come from devtools, probably Mark took the profile via devtools.

I already told you as much in comment 13...

the deadlock happened in the parent process or GPU process?

Did Firefox that far back already use a dedicated GPU process for e10s?

Mark, could you try with setting "accessibility.force_disabled" to true?

No, because it's an int ;P
Setting it to 1 or 2 solved the problem though. I don't have any special accessibility hardware or software running since I'm perfectly capable.
So the problem is e10s then?
I'm perfectly happy using this as a workaround for the time being, of course, but it still seems like a bug.

Flags: needinfo?(mark)

(In reply to Mark Straver from comment #17)

the deadlock happened in the parent process or GPU process?

Did Firefox that far back already use a dedicated GPU process for e10s?

Probably no.

Mark, could you try with setting "accessibility.force_disabled" to true?

No, because it's an int ;P
Setting it to 1 or 2 solved the problem though. I don't have any special accessibility hardware or software running since I'm perfectly capable.
So the problem is e10s then?

Great! Yes, this is somewhat related to E10S and accessibility stuff.

Component: SVG → Disability Access APIs
Keywords: regression
OS: Unspecified → Windows
Regressed by: 1310788
Has Regression Range: --- → yes

Allright then.
If there is any way in which I can help with this problem since I can easily reproduce, just let me know.

Hi, appears to be related to accessibility?

Flags: needinfo?(jteh)

It seems so, yes, and it's now in the right component. Keeping p3 priority as per comment 16.

I did take a look at the profile, but I can't see anything immediately obvious.

Flags: needinfo?(jteh)

If this happened on a device with a touch screen, or with a connected tablet, there's a chance that bug 1687535 will help.

Depends on: 1687535

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --

I don't know why, but this seems to be significantly better with Cache the World enabled.

Severity: -- → S3
Depends on: 1737192
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: