Closed Bug 1479196 Opened 6 years ago Closed 6 years ago

Browser crashes with a html file full of emojis, if you drag-and-drop the file from desktop to browser

Categories

(Core :: Graphics: WebRender, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- unaffected
firefox62 --- unaffected
firefox63 --- fixed

People

(Reporter: mayankleoboy1, Assigned: lsalzman)

References

Details

(Keywords: crash, crashreportid, regression)

Crash Data

Attachments

(8 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Build ID: 20180727231224

Steps to reproduce:

create new nightly profile
enable WR
open the attached html file


Actual results:

instant crash
https://crash-stats.mozilla.com/report/index/e5f1483f-6d16-410c-933d-61d8e0180728


Expected results:

no crash

Without WR, the page uses massive amount of memory, OOM, or just hangs
Attached file em.html
Component: Untriaged → Graphics: WebRender
Product: Firefox → Core
Version: 63 Branch → Trunk
Forgot to add that you need to enable WR
So, this may not be a WR bug per se

If you open the attachement directly as the attachment, it opens fine
BUT, if you download the attachment, and then drag-and-drop it into the browser, you get a crash.
Summary: Browser crashes with a html file full of emojis → Browser crashes with a html file full of emojis, if you drag-and-drop the file from desktop to browser
New STR:

1. create new profile. Enable WR
2. Download the attachment to your desktop
3. Drag-and-drop the file into the browser
OR
menu bar->file->open file and then browse to the downloaded attachment
Component: Graphics: WebRender → IPC
Severity: normal → critical
Crash Signature: [@ mozilla::ipc::ProcessLink::SendMessageW | IPC_Message_Name=PWebRenderBridge::Msg_SetDisplayList ]
Keywords: crash, crashreportid
Unfortunately, I still can't reproduce it with WebRender enabled and the file downloaded.
See Also: → 1479201
(In reply to YF (Yang) from comment #6)
> Unfortunately, I still can't reproduce it with WebRender enabled and the
> file downloaded.

not sure why. Im on Winx64, with Intel gfx
If you do the STR without WR, then the page hangs/rapidly increases memory for me
Attached file memory-report.json.gz
Attached file aboutsupport.txt
I think this seems to be html parser problem.

If wrong charset is specified,
With WebRender:  tab crashes
Without WebReder: browser become unresponsive for long period. and finally render text(Of course it is mojibake)
In nightly63.0a1, 
If wrong charset is specified (attachment 8995725 [details]),
With WebRender:  tab crashes
Without WebReder: browser become unresponsive for long period. and finally render text(Of course it is mojibake)


Un affected on Firefox62.0b12 and release61 and 60ESR.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf
Keywords: regression
(In reply to YF (Yang) from comment #6)
> Unfortunately, I still can't reproduce it with WebRender enabled and the
> file downloaded.


In your environment, because browser detected charset successfully.
Please try attachment 8995725 [details].
Regression window(crash with WebRender):
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=379a933da614e9176070e28522f486ea18ded5e1&tochange=36bc055f7ad075c17fd03781f1f1a074b00ccb42

regressed by: 36bc055f7ad0	Lee Salzman — Bug 1400917 - render missing glyphs with WebRender. r=gankro

:lsalzman,
Your patch seems to cause the crash, Could you look into this?


(about performance problem for non-WebRender, I will file an another new bug later)
Blocks: 1400917
Flags: needinfo?(lsalzman)
Keywords: perf
any idea why there is a behavioural difference between opening the file directly from the bugzilla link, and downloading it and then opening it in browser?
(In reply to Mayank Bansal from comment #14)
> any idea why there is a behavioural difference between opening the file
> directly from the bugzilla link, and downloading it and then opening it in
> browser?

Probably, the difference is auto-charset detection.


I can reproduce when open link in the bug and bookmark as well as D&D with attachment 8995725 [details].
See Also: → 1479213
Component: IPC → Graphics: WebRender
(In reply to Alice0775 White from comment #13)
> Regression window(crash with WebRender):
> https://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=379a933da614e9176070e28522f486ea18ded5e1&tochange=36bc
> 055f7ad075c17fd03781f1f1a074b00ccb42
> 
> regressed by: 36bc055f7ad0	Lee Salzman — Bug 1400917 - render missing glyphs
> with WebRender. r=gankro
> 
> :lsalzman,
> Your patch seems to cause the crash, Could you look into this?

Specifically, it looks like when there are lots of missing-glyph hexboxes to draw, we crash with "IPC message size is too large":

https://crash-stats.mozilla.com/report/index/27c7e9a5-0d01-47c9-902f-082370180730
Flags: needinfo?(lsalzman)
Hey David, this looks like a great stress test related to web renderer and page rendering - an IPC message from WR is hitting a max size constraint. Seems like something your team might want to investigate for usefulness.
Flags: needinfo?(dbolter)
See Also: → 1480134
I will look into optimizing the tofu rendering so that we're not bumping into the IPC message limit so easily.
Flags: needinfo?(dbolter)
This should drop the number of rectangles we're sending to WR down to half of what we were sending before, which should substantially reduce the memory/serialization pressure we're experiencing now. While eventually we probably want to move past the rectangle approach, this will at least keep us going for a while, I think, until then.
Attachment #8998409 - Flags: review?(jfkthame)
While this should reduce the pressure, it doesn't seem to go far enough to really resolve the issue here; I tried applying the patch locally, and still crash if I try to load the example page with incorrect encoding (resulting in lots of hexboxes) and webrender enabled.

Could we arrange to flush the WR pipeline periodically if a bunch of missing glyphs are being drawn, so that we don't accumulate many thousands of them into a single IPC message?
ISTM that drawing missing-glyph boxes with WR is currently not at all performant, and unless we can substantially improve it, our best option for the moment might be to fall back to non-WR rendering (revert bug 1400917) for now.

As an experiment, I tried simplifying further commenting-out the drawing of the hex digits altogether, so that we just draw an empty box for each missing glyph. This was enough to avoid the crash with the mis-encoded example here, but the rendering feels distinctly laggy.

Here's a profile of reloading the page (from a local file) with WR enabled:

  https://perfht.ml/2nklaej

For comparison, with WR disabled, the page renders much more snappily; here's the corresponding profile:

  https://perfht.ml/2nkJhcQ

It seems clear from the WR-enabled profile that painting lots of missing-glyphs, even when they're simplified to the point of just being empty rectangles, is really hurting us. :(
This builds off the support for WR color-masked images in https://github.com/servo/webrender/pull/2969 and allows the defining and use of WR images necessary for implementing the missing glyph atlas in a subsequent patch.

The parameters necessary to plumb that don't exactly fit into the normal DrawTarget interface, so in the spirit of avoiding square-peg/round-hole hacks, this just adds straight-forward DefineImage and PushImage calls, as there wasn't a sane way to map DrawSurface to them unfortunately.

While I was at it, I updated StrokeRect to map to a PushBorder command instead of multiple PushLine commands, as we need all the help we can get to reduce IPC traffic here.
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Attachment #8999722 - Flags: review?(a.beingessner)
As noted, this gets rid of the rectangle approach entirely and instead tries to draw each glyph with a single DrawSurface call. This is as opposed to using MaskSurface, which isn't necessarily well-supported by all DrawTargets, or as performant, and doesn't have all the necessary scaling and filtering options required implement.

For WR, there is a bit of complexity since we have no easy way to support transforms, but on the other hand we do easily support color-masking. So for a given writing mode transform, of which there are only a few, we need to generate a new atlas on demand to source from. Otherwise, there is just basic plumbing to make sure that when we purge the atlas, we clear the associated WR image resource, or that when the WR layer manager goes away, we clean up any stale WR image keys that were getting used with WR.
Attachment #8999725 - Flags: review?(jfkthame)
Comment on attachment 8999725 [details] [diff] [review]
draw missing glyphs from an atlas instead of rectangles

Review of attachment 8999725 [details] [diff] [review]:
-----------------------------------------------------------------

In general this looks like a good step forward, but I'm concerned about the use of a number of global variables to maintain state. Can you rework it to avoid that issue (or explain why it isn't actually a problem)?

::: gfx/thebes/gfxFontMissingGlyphs.cpp
@@ +76,5 @@
> +
> +static RefPtr<DrawTarget> gGlyphDrawTarget;
> +static RefPtr<SourceSurface> gGlyphMask;
> +static RefPtr<SourceSurface> gGlyphAtlas;
> +static Color gGlyphColor;

These global variables don't appear very thread-friendly. I thought we had parallel OMTP now, which (presumably) means there could be multiple paint threads wanting to draw missing glyphs at the same time, which looks like it'd lead to chaos here.

@@ +138,5 @@
> +}
> +
> +static RefPtr<SourceSurface> gWRGlyphAtlas[8];
> +static nsTArray<layers::WebRenderLayerManager*> gWRUsers;
> +static UserDataKey gWRUserDataKey;

I guess it's the global nsTArray here that triggers all the leak reports on your try run. Can we allocate this on first use instead of statically, and then delete it in the Shutdown method?

And again, thread-safety for P-OMTP seems like a concern here.
v2, used a LinkedList<WRUserData> instead of an nsTArray to get rid of the leak issue.

Also, regarding P-OMTP, this shouldn't be a problem, as P-OMTP is not going to operate at this level. Rather, it records into a DrawTargetCapture as the target, and then replays that in multiple tiles. This code will only get executed on a single thread.
Attachment #8999725 - Attachment is obsolete: true
Attachment #8999725 - Flags: review?(jfkthame)
Attachment #8999997 - Flags: review?(jfkthame)
Comment on attachment 8999997 [details] [diff] [review]
draw missing glyphs from an atlas instead of rectangles

Review of attachment 8999997 [details] [diff] [review]:
-----------------------------------------------------------------

OK, that sounds fair enough.

One bug did catch my eye; aside from that, looks reasonable, but I'd quite like to try it and see how it behaves under stress. Do you expect to get the related WebRender PR merged sometime soon, or should I just try applying it locally for a test build?

::: gfx/thebes/gfxFontMissingGlyphs.cpp
@@ +267,5 @@
> +    uint32_t key = 0;
> +    // Encode orientation in the key.
> +    if (aMat) {
> +        if (aMat->_11 == 0) {
> +            key |= (1 << 4) | (aMat->_12 < 0 ? 1 : 0) | (aMat->_21 < 0 ? 2 : 0);

I think `(1 << 4)` here should just be `4`, otherwise the resulting key may be out of bounds for the gWRGlyphAtlas array.
v3, fix 1<<4 bug.
Attachment #8999997 - Attachment is obsolete: true
Attachment #8999997 - Flags: review?(jfkthame)
Attachment #9000072 - Flags: review?(jfkthame)
This is just a quick patch for WebRender colored image support that you can use to test the mini-font atlas patches, until the upstream WR PR actually gets merged.
Thanks, that's useful.

I tried the testcase in a local (opt) build, using a locally-saved copy of the testcase, with charset=iso-8859-1 so that it loads with the wrong encoding and shows a tofu glyph as part of each emoji.

It's better, in that the testcase no longer crashes; but the performance is still a big problem -- to the extent that it takes several seconds to render, and the whole browser UI (i.e. the chrome process, not just the tab content) becomes pretty unresponsive while the testcase is open (just moving the mouse around, trying to access a menu, etc., I get frequent cursor beachballing, and extremely slow responses, like several seconds to register a click on a menu).

Here's a profile of loading the testcase.[1] We can see that RenderLayers in the content process is taking 1.8 seconds (ouch!), and we can also see a big lag between this and the compositor finally doing some work -- which I presume is largely because sending all this over IPC takes much too long.

For comparison, if I patch gfxFont::DrawMissingGlyph to just call textDrawer->FoundUnsupportedFeature(), the testcase renders a lot more quickly (profile [2]). It's still not great: there's a noticeable delay in the initial rendering, and trying to select text is terribly sluggish -- like responsiveness is delayed by a second or more -- but it's a lot better than the WR-image version.

(With WebRender disabled altogether, it's better still.[3] The testcase renders snappily, and there's just a bit of UI lagginess, e.g. when drag-selecting in the tofu-sprinkled text. But very usable.)

So while this definitely helps, in that it means we don't crash (though presumably a larger testcase could still break the IPC), I don't think it really gets far enough; we'd still be better off disabling the WR-tofu rendering.

[1] https://perfht.ml/2nGWCMW -- WR enabled, drawing tofu as images
[2] https://perfht.ml/2MsjDB3 -- WR enabled, but tofu flagged as unsupported
[3] https://perfht.ml/2ODwdLm -- WR disabled
Comment on attachment 9000072 [details] [diff] [review]
draw missing glyphs from an atlas instead of rectangles

Review of attachment 9000072 [details] [diff] [review]:
-----------------------------------------------------------------

OK, after discussion on IRC and testing with some less-pathological examples, it looks like the original testcase here is particularly bad because of how the (mis-encoded) text all ends up as one massively-long line, but with a scattering of tofu glyphs -- even a large number of them -- among more "normal" content, we don't see anywhere near such bad perf characteristics.

So let's go with this, to fix the crashing issue (or at least make it substantially harder to reach), and considerably improve perf compared to the current implementation.

Further reducing the IPC overhead in extreme cases (like the testcase here) would be nice if we can do it sometime, but we can leave that as a future enhancement.

::: gfx/thebes/gfxFontMissingGlyphs.cpp
@@ +120,5 @@
> +{
> +    // Get the opaque color, ignoring any transparency which will be handled later.
> +    Color color(aColor.r, aColor.g, aColor.b);
> +    if ((gGlyphAtlas && gGlyphColor == color) ||
> +        MakeGlyphAtlas(color)) {

This would actually be more readable without wrapping the line, IMO.
Attachment #9000072 - Flags: review?(jfkthame) → review+
Attachment #8998409 - Flags: review?(jfkthame)
Attachment #8999722 - Flags: review?(a.beingessner) → review+
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c610000042a
add support to TextDrawTarget for defining and pushing WR images. r=nical
https://hg.mozilla.org/integration/mozilla-inbound/rev/07dabfefeef1
draw missing glyphs from an atlas instead of rectangles. r=jfkthame
https://hg.mozilla.org/mozilla-central/rev/9c610000042a
https://hg.mozilla.org/mozilla-central/rev/07dabfefeef1
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Depends on: 1485358
Depends on: 1485360
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: