Closed Bug 1707310 Opened 3 months ago Closed 2 months ago

shadow dom host styles randomly not applying on initial render

Categories

(Core :: CSS Parsing and Computation, defect)

Firefox 88
defect

Tracking

()

VERIFIED FIXED
90 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- wontfix
firefox89 + verified
firefox90 --- verified

People

(Reporter: dcarlson, Assigned: emilio)

References

(Regression)

Details

(Keywords: regression)

Attachments

(6 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36

Steps to reproduce:

After upgrading to Firefox 88, rendering of webcomponents inconsistently fails. No code was changed.

Actual results:

The host styles defined in the shadow DOM of a web component will sometimes not apply to the element.

In my attached file you can see I have three icons that are created within a web component, each component has a :host style with display style flex applied to it. All CSS for these icons are exactly the same, yet sometimes the host styles do not apply. If I open the inspector, click into the shadow DOM's style tag to trigger an "edit", and then click out to have the browser re-fire the render, the component is fixed despite no changes.

This behavior is inconsistent. When I refresh, sometimes all host styles apply properly, sometimes various web components on the page will have their host styles not apply. There doesn't seem to be a consistent pattern to which web component faces this issue.

Expected results:

All host styles for all web components should apply consistently.

The Bugbug bot thinks this bug should belong to the 'Core::DOM: Core & HTML' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → DOM: Core & HTML
Product: Firefox → Core

Can you attach an html file that reproduces the issue (even if inconsistently)? If not, at least a URL?

I suspect this might be bug 1707116, but it might also be something else, so something that reproduces the issue would be great so that I can investigate.

Flags: needinfo?(dcarlson)
Component: DOM: Core & HTML → CSS Parsing and Computation

The current example is within our app at https://app.ontraport.com but an account is required. You can make a free trial at this address, no credit card required: https://ontraport.com/signup-pro

If this barrier of entry is too much restriction for your testing, I can work on getting you a pared down code example, but that will have to be later in the week. The bug you linked does look quite similar, so if that update is added please let me know and I will investigate if it resolves the issue. Thank you for your assistance.

Flags: needinfo?(dcarlson)

Hi there. I noticed that the related bug was closed, but we are still encountering issues on v89.0b11

We are still working on a reduced test case, but we went ahead and made some creds for our site that you can use for testing.
If you visit: https://app.ontraport.com

And log in with the following credentials:
Email: vlam+firefox-bug@sendpepper.com
Password: Firefoxbug!

If you click around on the app's navigation bar, you should notice the icons that appear incorrect. Invalid size, incorrectly rotated, incorrect colors.

We use a library called lit element that we build our webcomponents from, it is possible there is a correlation: https://lit-element.polymer-project.org/guide

I'll try to poke, thanks.

Flags: needinfo?(emilio)

I haven't been able to see stuff out of the ordinary on Nightly. I'll try on a clean profile etc, though. On a release build I could see the issue pretty fast. Perhaps my patch hasn't made it to a beta release yet?

Can you reproduce on a Nightly build from https://nightly.mozilla.org?

Flags: needinfo?(emilio) → needinfo?(dcarlson)

I was still able to recreate with Nightly, although it did seem less frequent? Hard to tell I just click back and forth on the different navigation items until it occurs: https://imgur.com/a/Typ65lp

Flags: needinfo?(dcarlson)

Ok, I could repro after a lot of tries on Nightly, on a clean profile. On older builds it reproduces much faster it seems. Bisect says, unsurprisingly: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c6ba3d142fb2334dc0ddd65600c40f897c426b27&tochange=f088cf41fc03ae1acfd956bf9150ca0bdf0a6c1e

Will try to take a look, but by all means if you can find a simpler example that reproduces the bug let me know :)

Assignee: nobody → emilio
Status: UNCONFIRMED → NEW
Ever confirmed: true
Regressed by: 1696447

[Tracking Requested - why for this release]: Would be nice not to ship this regression in 89, if the fix is simple enough

It seems there's something funky with those icon components, in the sense that the stylesheets seem modified dynamically based on attributes like rotate, so the stylesheets end up with different text nodes. Is there any chance you could share the source of them so that I can try to build a more reduced test-case?

Flags: needinfo?(dcarlson)
Flags: needinfo?(emilio)
Attached file web-components.zip

Source code for our web components. These files depend on lit element https://github.com/lit/lit-element and are built with webpack.

This file pulls in our web components that are bundled using webpack at https://app.ontraport.com/js/webcomponents/op-webcomponents.production.js.

The page has instructions and details to how to eventually get the inconsistent problem to appear.

Hi Emilio, I work with David. I've attached an HTML page that can reproduce the issue. Sorry, you just gotta host it yourself somewhere. I've also attached the source code for our web components.

The styles that make use of the rotate attribute is in op-icon.ts. <op-icon> is also used by <op-button-icon>.

Thanks for your quick responses. Let us know what else we can do to help!

Thanks Vinh, that's amazing! I could repro on a local build, and I could speed it up a bit.

On a local build / Nightly this only happens with DevTools open, afaict. This makes sense because devtools forces the stylesheets to be "unique", which in turn causes a bunch of entries to get inserted into the cache. Given this is a bug in the stylesheet cache for shadow DOM, that makes some amount of sense (not that I know what happens, yet).

In your page, there's something that is causing the stylesheets to be unique when you click around in the menu. That causes the caching mechanism not to work as well as it should (it triggers this bug, which is great because we find it, but also it triggers a lot of extra cache misses).

If I find some time after I figure out this bug, I can look into it and see what is triggering that. Usually that is caused by stylesheets getting accessed by CSSOM. Do you have a component that walks the whole DOM looking at .cssRules or such?

The sample page I provided and the op-webcomponents.production file that's pulled in doesn't use .cssRules at all.

We use .cssRules a few places in our web app, but not anything that's called when you're clicking around the nav menu. I even commented out all the chunks that use it and still encounter the issue.

Thanks for digging in tot his!

Severity: -- → S3

Ok, so I think I understand how this happens now. It's kinda gnarly, but it is related to CSSOM after all.

What's going on is that we're getting confused and the cache key between two components incorrectly matches, because of pointer address reuse and CSSOM. So all the stylesheet sharing is getting destroyed in your app by some traversal that you're doing somehow, will attach a JS stack.

That is causing a lot of extra stylesheet contents to be around, and some of them may reuse addresses that are still in the cache because we don't force a recreation of the cascade data when the contents change without writing.

So there are various things to fix here:

  • Read-only access to the OM ideally shouldn't cause us to copy-on-write stylesheets (that's a bit harder than it seems, but seems potentially doable, will look into it).

  • Since we're keeping stylesheet content addresses as the cache key, we need to force-recreate the cascade-data when we deduplicate stylesheets.

Both of these are related as the later would probably be unnecessarily expensive without the former. So I'll work on that.

Flags: needinfo?(dcarlson)

I don't know what fullstory.com is, but whatever they're doing seems somewhat expensive, and is destroying all our style sharing optimizations.

That fullstory.com script being the culprit also explains why I couldn't reproduce this initially, since uBlock Origin blocks it, and I had uBlock origin on my profile.

Depends on: 1711437

This prevents incorrectly reusing cached results when the contents go
away and new contents are allocated with the same address.

Note that these keep alive transitively everything else under them, so
all other medialist keys don't need this.

By making this a proper hashmap it should also improve cache lookup
times if the cache grows too big.

Flags: needinfo?(emilio)

I wrote a fix that doesn't depend on bug 1711437 to hopefully be able to uplift this to beta. Bug 1711437 is still a worth it optimization.

No longer depends on: 1711437

So Full Story is a service we use to collect user usage data. It has the ability to record and playback user sessions and track elements (buttons, inputs, etc) that the user interacts with.

I'm going to try to get a build that disables Full Story and then try to reproduce.

But ultimately, Full Story isn't on the stand alone sample that I provided. Maybe Full Story just exacerbates it.

Will keep you posted.

I created a feature build on a test server that does not load Full Story and the issue still occurs.

Is there anything else I can try to do?

(In reply to Vinh Lam from comment #22)

I created a feature build on a test server that does not load Full Story and the issue still occurs.

On Nightly? Or on release? If on release, that doesn't have the fix for bug 1707116 so it's not totally unexpected.

If you can check on Nightly with a build from here (you can click the "B" near your platform and download the installer / zip / tarball as appropriate) it'd be great, to confirm that the fix is effective.

Flags: needinfo?(dcarlson)
Flags: needinfo?(dcarlson) → needinfo?(vlam)

I was trying on beta version 89b13.

I just went to the build you linked and downloaded the OS X Cross Compiled asan build. The problem looks to be fixed! Awesome work!

I've clicked around the nav numerous times with the DevTools opened and closed and have not noticed messed up icons so far. I've also let my test page site up to 100 attempts and it has not encountered any styling errors.

What are next steps now? How soon can this fix get released to public builds? Right now our app pops a dialog for Firefox users that tells them to use Chrome until this bug is resolved.

Thank you so much for your help!

Flags: needinfo?(vlam)

Also glad you were able to create a fix that still allows us to use Full Story. :thumbs-up:

(In reply to Vinh Lam from comment #24)

I've clicked around the nav numerous times with the DevTools opened and closed and have not noticed messed up icons so far. I've also let my test page site up to 100 attempts and it has not encountered any styling errors.

Great to hear!

What are next steps now? How soon can this fix get released to public builds? Right now our app pops a dialog for Firefox users that tells them to use Chrome until this bug is resolved.

It needs to be reviewed and landed, and then I'll try to get it on beta 89.

FWIW, one workaround you can probably use is to add a dummy media query to the components (just something like @media all { } should do, I think.

That should make sure we never confuse one entry in the cache for the one for another component, as they'd have different lengths.

Thank you so much for your help!

Thank you both for the help coming up with a reproduction for this, it was quite a heisenbug, since it depends on how often and when the memory allocator reuses addresses... :-)

Comment on attachment 9222148 [details]
Bug 1707310 - Refactor the author sheet cache to keep alive the relevant StylesheetContents. r=boris

Beta/Release Uplift Approval Request

  • User impact if declined: comment 0
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): I tried to make the patch as self-contained as possible to make it upliftable.
  • String changes made/needed: none
Attachment #9222148 - Flags: approval-mozilla-beta?
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2c802e9718f9
Refactor the author sheet cache to keep alive the relevant StylesheetContents. r=boris
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch

Comment on attachment 9222148 [details]
Bug 1707310 - Refactor the author sheet cache to keep alive the relevant StylesheetContents. r=boris

Fix for an 88 regression that created a webcompat issue, approved for 89 beta 15, thanks.

Attachment #9222148 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hello, I'm just trying to get some clarity. Do you think the fix will be released with version 89 or 90?

Thank you everyone!

It was merged to the 89 branch above (in comment 31)

Great, thanks for clarifying!

Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Hi, This is Verified as fixed in our latest Nightly build as well as Beta 89.0b15 on Windows, Mac and Ubuntu.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+

Thanks for all the help everyone!

You need to log in before you can comment on or make changes to this bug.