Open Bug 1725781 Opened 3 years ago Updated 7 months ago

SAP application very slow since 72.0b1, with deeply nested percent-height tables

Categories

(Core :: Layout: Tables, defect)

Firefox 91
defect

Tracking

()

Performance Impact medium

People

(Reporter: joerg.goetting, Unassigned)

References

(Depends on 1 open bug)

Details

(4 keywords)

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0

Steps to reproduce:

Attached example html (saved state of an SAP application) runs very slow in current Firefox versions.
Tests with older versions show that the important change occurred in 72.0b1. All older versions including 71.0 do fine. In all versions starting from 72.0b1 including Beta 92.0b3 (32-Bit), Dev 92.0b3 (64-Bit) and Nightly 93.0a1 (2021-08-13) (64-Bit) rendering is very slow.

Actual results:

In the live application, Firefox shows message "The page slows down Firefox ...".
With the saved test html file it takes more than a minute to render the page.

Current Edge and Chrome display the page immediately.
Edge, Chrome had a quite similar performance problem with the same SAP application two weeks ago in the new version 92. The problem was fixed with this bug (included in Chromium 92 in the meantime):
https://bugs.chromium.org/p/chromium/issues/detail?id=1232804
("Issue 1232804: Performance regression in Version 92.0.4515.107 - Page layout with nested flexbox")

Expected results:

The page should be displayed as quickly as up to version 71.0.

The Bugbug bot thinks this bug should belong to the 'Core::Layout: Flexbox' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Layout: Flexbox
Product: Firefox → Core

(In reply to Jörg Götting from comment #0)

Edge, Chrome had a quite similar performance problem with the same SAP application two weeks ago in the new version 92. The problem was fixed with this bug (included in Chromium 92 in the meantime):
https://bugs.chromium.org/p/chromium/issues/detail?id=1232804
("Issue 1232804: Performance regression in Version 92.0.4515.107 - Page layout with nested flexbox")

This could conceivably be a regression from bug 1591925, then (one of the bugs in comment 2's pushlog).

I don't see a lot of flexbox in the stacks for comment 2's profile, though -- so I'm also wondering if this might be a regression from bug 1595199.

Flags: needinfo?(dholbert)
Whiteboard: [qf]

Quite an old regression, it seems, but would be good to get to the bottom of this; presumably there must be some number of other pages/sites also being affected.

Severity: -- → S3
Has Regression Range: --- → yes
Keywords: regression

I'm marking this p1. The reason being that this seems to constitute an indefinite hang. By valid webcontent. That Chrome and Edge have addressed.

Whiteboard: [qf] → [qf:p1:responsiveness]

This standalone testcase (reduced from previously-attached SAP zip file) takes around 1 or 2 seconds to render for me in Firefox Nightly 94.0a1 (2021-09-22), whereas it renders with no noticeable delay in Chrome.

Profile: https://share.firefox.dev/2ZhQQHn (including a 1.7 second reflow)

Flags: needinfo?(dholbert)

I just double-checked the regression range locally (using the original testcase to be on the safe side), and I got tighter range than in comment 2 -- perhaps my platform has finer-grained builds vs. those that were available for comment 2:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9e3f44e87a1a2be927c7fab135653c6b86b982c9&tochange=35436d4e7917bf9d9b96a6173201ca001a8ff7bc

This does not include any flexbox changes, so this definitely isn't a flexbox bug. I suspect this belongs in tables, given that the testcases here are just deeply-nested tables, basically.

Component: Layout: Flexbox → Layout: Tables

This is a further-reduced testcase, which I made from scratch after noticing lots of height:100% on nested table/td in the original testcase (and noticing that the load time rapidly improves if I reduce those).

The layout time here seems to increase exponentially with the level of nesting, so I can make this testcase arbitrarily-bad by just adding a few more levels of <table><td>.

This loads ~instantly in Chrome, FWIW.

Also notable, though -- this is slow in builds from before this bug's regression range. So this particular testcase is a broader / longer-standing bit of pain. I suspect that perhaps the regression here was that we were avoiding this slow path for some common case, and now we are not. (But nonetheless, it was & remains a slow path in Firefox which happens to not be a slow path in Chrome.)

Here's a profile of testcase 3: https://share.firefox.dev/3u2r3OR
It takes ~6 seconds of layout on my machine, with 20 levels of nested tables (and I can make it arbitrarily worse by adding a few additional levels).

In Chrome, it loads with no noticeable delay.

Possibly the same core issue as bug 792508 (with the wrinkle that we had a regression around 2019-11-13 where we changed to fall into this slow path in a case where we previously avoided doing so).

See Also: → 792508
Attachment #9242594 - Attachment description: testcase 3 (just nested 100%-height table/td) → testcase 3 (just nested 100%-height table/td; hangs for ~6 seconds on dholbert's Thinkpad)

Also: I can make older builds (before the regression range) reproduce the issue, with the original testcase, if I add

* { overflow: visible !important; }

So in old builds, we were being "saved" by some scrollable element effectively breaking a nested chain of height:100% elements. And bug 1595199 correctly fixed that by propagating a percent-height-resolution flag through scrollable elements, which means the scrollframe doesn't "save" us anymore.

So, the proximal regression here was from bug 1595199, which was a correctness fix that made us propagate a flag that we should have been propagating all along. And the real lower-level issue here is long-time bug 792508.

Can we retest this after Fission?

Flags: needinfo?(dholbert)

This isn't related to fission. It seems to be a long-standing inefficiency in our <table> layout code.

But in any case, I confirmed it's still slow in Nightly where I have Fission active (confirmed in about:support). Testcase 3 takes ~6 seconds to load on my thinkpad, which matches the behavior observed before.

Flags: needinfo?(dholbert)

Just wanted to understand if this was still an issue post-Fission, not necessarily related to Fission. Thanks!

[edit: inadvertently submitted a partially-written comment]

We might want to dupe this and bug 792508, since they're the same core issue (modulo comment 12).

Whatever we do, testcase 3 here is a nice minimal testcase for whoever works on this to start with; and bug 792508 comment 2 - 3 has some analysis/thoughts from Mats 10 years ago, too.

Summary: SAP application very slow since 72.0b1 → SAP application very slow since 72.0b1, with deeply nested percent-height tables
Performance Impact: --- → P1
Whiteboard: [qf:p1:responsiveness]

The Performance Priority Calculator has determined this bug's performance priority to be P1.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Impact on site: Causes noticeable jank
Configuration: Rare
Page load impact: Some
[x] Affects major website
[x] Able to reproduce locally
[x] Bug affects multiple sites

See Also: → 556305

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:dholbert, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(dholbert)

Replying to comment 17 comment 18: In terms of prioritization, I'm not sure the Affects major website / top50 assessment is accurate. I would guess/hope that this is rare in the wild.

SAP is itself a prominent web property, and this bug was reported for a SAP-hosted web application, but importantly there's no indication that this bug affects any substantial portion of SAP applications in general. (If it did, that would be important to know & would be quite concerning, and a severity-bump would totally be merited.)

This perf issue would still be great to fix, but it probably won't happen right away; the fix won't be trivial and will likely include some substantial risk of breakage (and hence will need careful thought/testing/vetting), given this is a long-standing inefficiency with deeply nested <table> elements, which are a pretty-old/crufty part of our codebase.

I'll take the bot's suggestion to reset Performance Priority Calculator to ? (and I'll remove the top50 keyword). On the Performance Priority Calculator, I think the appropriate Websites affected: value is Rare (though Configuration should be General, since this doesn't require any special configuration on the client side). This lowers the perf impact assessment a bit, to "low" when I click through the calculator tool. (If we have signals that this is actually a common issue, I'm of course happy to be corrected on this.)

Performance Impact: high → ?
Flags: needinfo?(dholbert)
Keywords: top50

Thanks @dholbert, we've recalculated the impact as medium.

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: Windows
Impact on site: Causes noticeable jank
Page load impact: Some
Websites affected: Rare
[x] Able to reproduce locally

Performance Impact: ? → medium
Depends on: 1883699
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: