Closed Bug 1530190 Opened 2 years ago Closed 1 year ago

nsIconChannel on Windows can flush layout at a bad time.

Categories

(Core :: ImageLib, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox-esr60 --- wontfix
firefox-esr68 71+ fixed
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- fixed

People

(Reporter: emilio, Assigned: tnikkel)

References

(Blocks 2 open bugs)

Details

(Keywords: regression)

Crash Data

Attachments

(9 files, 2 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
22.61 KB, text/plain
Details

This is the nsIconChannel issue in bug 1438939. I filed a different bug for the other issue I'm aware of, which is bug 1530188.

Priority: -- → P3
Duplicate of this bug: 1541427
Crash Signature: [@ mozilla::RestyleManager::ContentStateChanged]
Assignee: nobody → tnikkel
Duplicate of this bug: 1544818
Crash Signature: [@ mozilla::RestyleManager::ContentStateChanged] → [@ mozilla::RestyleManager::ContentStateChanged] [@ mozilla::PresShell::ContentStateChanged ]
Keywords: regression

Changing the priority to p1 as the bug is tracked by a release manager for the current beta.
See How Do You Triage for more information

Priority: P3 → P1

FWIW I can probably paper over the crash by downgrading the assertion to a diagnostic assert... Do we is accessible from content?

It doesn't seem to be... data:text/html,<img width=16 height=16 src="moz-icon://foo.txt?size=16">

If it's not, and if this is the only offender of the assertion, probably downgrading it at least for now makes sense... WDYT?

Tim, are you working on this bug? Thanks

Flags: needinfo?(tnikkel)

Yeah, but if the expectation is an upliftable patch to beta for this cycle I'm not sure that is a reasonable goal.

Flags: needinfo?(tnikkel)
Depends on: 1545842

There are two reasons for this. One is performance, it can be slow. The second is that is can spin the event loop which can re-enter into things like layout.

This patch uses the image decoder thread pool for that. But this is unsuitable (reason in next patch), the next patch changes it to use the img io thread used for network to pass data to imglib off main thread.

This mutex only protects the iothread, not the rest of the decodepool. There is a different monitor for that in DecodePoolImpl.

Depends on D28419

The unifed tweak I make in the next patch exposes this.

Depends on D28420

We need to call CoInitialize, so we need to include the objbase.h header. That defines some generic boolean stuff, which conflicts with something that libjpeg defines. I tried working around it, but I gave up, this solution seems fine.

Depends on D28421

Windows doesn't like more than one thread (at the same time?) calling SHGetFileInfoW. So we use the same thread to avoid this.

We also need to call CoInitialize on the thread we do this.

Depends on D28422

The system call we need to use (SHGetFileInfo) can't be called from more than one thread at a time. So without adding a bunch of extra unnecessary complexity we can't support both Open and AsyncOpen. Further, SHGetFileInfo can spin the event loop (and we open icon channels in the middle of layout) so it's not even safe to call it synchronously.

nsIconChannel::Open isn't used on a full try run, I don't know enough about networking to know if we can entirely rule out that it is used or not.

Depends on D28423

With nsIconChannel::Open no longer implemented we never pass false for aNonBlocking. We can remove that code and assert that SHFileGetInfo is called off main thread. Note that I would like to assert that we always call it on the same thread, but there isn't a super simple way to assert that.

Depends on D28424

[Tracking Requested - why for this release]:
I don't think we need to track this for 67 anymore because bug 1545842 downgraded the release assert to diagnostic assert.

It's also late in the beta cycle to uplift significant work, so marking as wontfix for 67 since we have a short term solution in place with bug 1545842. Thanks

Would you be able to shed any light on who uses nsIChannel::Open and if any of those users would use it on an icon channel? re https://phabricator.services.mozilla.com/D28424

Flags: needinfo?(jkt)

As stated in other patches in this bug, the system call has to happen off main thread, and it has to always be on the same thread. So we dispatch the task to that thread and synchronously wait on the main thread for it to finish.

Depends on D28423

Sorry I don't really know much more than you do, I was able to assess the code paths and see that we weren't using the legacy format in my change in Bug 1520868 however I don't see anything different to your assement here:
https://phabricator.services.mozilla.com/D28424#860308

Flags: needinfo?(jkt)
Attachment #9059966 - Attachment is obsolete: true
Attachment #9059965 - Attachment is obsolete: true

Looks like this is left over from a partial attempt at making GetStockIcon async like GetHIconFromFile.

But I don't think it needs to be async, the only system call SHGetStockIconInfo doesn't have anything about background thread like SHGetFileInfo.

Depends on D29626

We frequently get more asserts in this test after we increased the async-ness of AsyncOpen.

Before the patches of this bug, in this test we would see 15 calls to nsIconChannel::AsyncOpen, and then a flurry of reflow asserts ("ASSERTION: bad inline size").

After the patches for this bug, sometimes we get all 15 AsyncOpen calls to complete (FinishAsyncOpen) before the asserts start. In that case we get an expected number of asserts. But frequently we get a case where some FinishAsyncOpen calls don't happen until after all of the asserts. In this case we get more asserts than we are expecting.

So it seems the lack of a complete icon image is causing extra reflow churn (it's the same asserts, just more).

Depends on D31013

Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/71bf2b9ca6ea
Call SHGetFileInfo off the main thread in nsIconChannel on Windows. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb340bd557d4
Change description of DecodePool::mMutex to be more accurate. r=aosmond
https://hg.mozilla.org/integration/mozilla-inbound/rev/875d43d35a71
Fix unified build bustage in image/Downscaler.cpp. r=aosmond
https://hg.mozilla.org/integration/mozilla-inbound/rev/8010faead0fd
De-unify image/DecodePool.cpp on Windows. r=aosmond
https://hg.mozilla.org/integration/mozilla-inbound/rev/2d9c02e85710
Use the image io thread to call SHGetFileInfo. r=aosmond
https://hg.mozilla.org/integration/mozilla-inbound/rev/642316b68e6a
Allow nsIconChannel::Open to work on Windows. r=aosmond
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d13cc5dbb8a
Cleanup GetStockIcon from first patch. r=aosmond
https://hg.mozilla.org/integration/mozilla-inbound/rev/17018058233c
Adjust assert expectations on  toolkit/content/tests/chrome/test_bug437844.xul. r=aosmond

\p/

That was supposed to be \o/ :-)

Thanks for fixing this Timothy

Blocks: 1553061

Taking care of the assert in bug 1545842 for 68.

No longer blocks: 1553061
Regressions: 1553061
Regressions: 1556360
See Also: → 1587850
Duplicate of this bug: 622840
Attached file rollup patch for esr68

Comment on attachment 9106756 [details]
rollup patch for esr68

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: 1) it can cause long hangs. as reported in bug 1593137
  1. it can cause crashes, since we run the event loop at a bad time
  2. it could be security maybe, since running the event loop at a bad time is not a good idea
  • User impact if declined: mainly hangs and some crashes
  • Fix Landed on Version: 69
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): this change felt a little risky when landed but it's been on release for a couple of months now with no problems reported. It seems like a lot of patches but a lot of them were split out for easier reviewing, or because the assignee of the bug changed midway through so the work needed to be split up so it could be reviewed. all the patches applied without any changes, nothing else has really touched this code, so unlikely any other changes could interfere.
  • String or UUID changes made by this patch: none
Attachment #9106756 - Flags: approval-mozilla-esr68?
Blocks: 1593137
Comment on attachment 9106756 [details]
rollup patch for esr68

Hard to know the impact but let's take this to avoid the hang/crash, it's been on release for months without a problem.
Attachment #9106756 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
You need to log in before you can comment on or make changes to this bug.