SAP application very slow since 72.0b1, with deeply nested percent-height tables
Categories
(Core :: Layout: Tables, defect)
Tracking
()
Performance Impact | medium |
People
(Reporter: joerg.goetting, Unassigned)
References
(Depends on 1 open bug)
Details
(4 keywords)
Attachments
(4 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0
Steps to reproduce:
Attached example html (saved state of an SAP application) runs very slow in current Firefox versions.
Tests with older versions show that the important change occurred in 72.0b1. All older versions including 71.0 do fine. In all versions starting from 72.0b1 including Beta 92.0b3 (32-Bit), Dev 92.0b3 (64-Bit) and Nightly 93.0a1 (2021-08-13) (64-Bit) rendering is very slow.
Actual results:
In the live application, Firefox shows message "The page slows down Firefox ...".
With the saved test html file it takes more than a minute to render the page.
Current Edge and Chrome display the page immediately.
Edge, Chrome had a quite similar performance problem with the same SAP application two weeks ago in the new version 92. The problem was fixed with this bug (included in Chromium 92 in the meantime):
https://bugs.chromium.org/p/chromium/issues/detail?id=1232804
("Issue 1232804: Performance regression in Version 92.0.4515.107 - Page layout with nested flexbox")
Expected results:
The page should be displayed as quickly as up to version 71.0.
Comment 1•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Layout: Flexbox' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Comment 3•3 years ago
|
||
(In reply to Jörg Götting from comment #0)
Edge, Chrome had a quite similar performance problem with the same SAP application two weeks ago in the new version 92. The problem was fixed with this bug (included in Chromium 92 in the meantime):
https://bugs.chromium.org/p/chromium/issues/detail?id=1232804
("Issue 1232804: Performance regression in Version 92.0.4515.107 - Page layout with nested flexbox")
This could conceivably be a regression from bug 1591925, then (one of the bugs in comment 2's pushlog).
I don't see a lot of flexbox in the stacks for comment 2's profile, though -- so I'm also wondering if this might be a regression from bug 1595199.
Updated•3 years ago
|
Comment 4•3 years ago
|
||
Quite an old regression, it seems, but would be good to get to the bottom of this; presumably there must be some number of other pages/sites also being affected.
Comment 5•3 years ago
|
||
I'm marking this p1. The reason being that this seems to constitute an indefinite hang. By valid webcontent. That Chrome and Edge have addressed.
Comment 6•3 years ago
|
||
This standalone testcase (reduced from previously-attached SAP zip file) takes around 1 or 2 seconds to render for me in Firefox Nightly 94.0a1 (2021-09-22), whereas it renders with no noticeable delay in Chrome.
Profile: https://share.firefox.dev/2ZhQQHn (including a 1.7 second reflow)
Comment 7•3 years ago
|
||
Comment 8•3 years ago
|
||
I just double-checked the regression range locally (using the original testcase to be on the safe side), and I got tighter range than in comment 2 -- perhaps my platform has finer-grained builds vs. those that were available for comment 2:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9e3f44e87a1a2be927c7fab135653c6b86b982c9&tochange=35436d4e7917bf9d9b96a6173201ca001a8ff7bc
This does not include any flexbox changes, so this definitely isn't a flexbox bug. I suspect this belongs in tables, given that the testcases here are just deeply-nested tables, basically.
Comment 9•3 years ago
|
||
This is a further-reduced testcase, which I made from scratch after noticing lots of height:100%
on nested table/td in the original testcase (and noticing that the load time rapidly improves if I reduce those).
The layout time here seems to increase exponentially with the level of nesting, so I can make this testcase arbitrarily-bad by just adding a few more levels of <table><td>
.
This loads ~instantly in Chrome, FWIW.
Also notable, though -- this is slow in builds from before this bug's regression range. So this particular testcase is a broader / longer-standing bit of pain. I suspect that perhaps the regression here was that we were avoiding this slow path for some common case, and now we are not. (But nonetheless, it was & remains a slow path in Firefox which happens to not be a slow path in Chrome.)
Comment 10•3 years ago
|
||
Here's a profile of testcase 3: https://share.firefox.dev/3u2r3OR
It takes ~6 seconds of layout on my machine, with 20 levels of nested tables (and I can make it arbitrarily worse by adding a few additional levels).
In Chrome, it loads with no noticeable delay.
Comment 11•3 years ago
|
||
Possibly the same core issue as bug 792508 (with the wrinkle that we had a regression around 2019-11-13 where we changed to fall into this slow path in a case where we previously avoided doing so).
Updated•3 years ago
|
Comment 12•3 years ago
•
|
||
Also: I can make older builds (before the regression range) reproduce the issue, with the original testcase, if I add
* { overflow: visible !important; }
So in old builds, we were being "saved" by some scrollable element effectively breaking a nested chain of height:100%
elements. And bug 1595199 correctly fixed that by propagating a percent-height-resolution flag through scrollable elements, which means the scrollframe doesn't "save" us anymore.
So, the proximal regression here was from bug 1595199, which was a correctness fix that made us propagate a flag that we should have been propagating all along. And the real lower-level issue here is long-time bug 792508.
Comment 14•3 years ago
|
||
This isn't related to fission. It seems to be a long-standing inefficiency in our <table>
layout code.
But in any case, I confirmed it's still slow in Nightly where I have Fission active (confirmed in about:support). Testcase 3 takes ~6 seconds to load on my thinkpad, which matches the behavior observed before.
Comment 15•3 years ago
|
||
Just wanted to understand if this was still an issue post-Fission, not necessarily related to Fission. Thanks!
Comment 16•3 years ago
•
|
||
[edit: inadvertently submitted a partially-written comment]
We might want to dupe this and bug 792508, since they're the same core issue (modulo comment 12).
Whatever we do, testcase 3 here is a nice minimal testcase for whoever works on this to start with; and bug 792508 comment 2 - 3 has some analysis/thoughts from Mats 10 years ago, too.
Updated•3 years ago
|
Comment 17•2 years ago
|
||
The Performance Priority Calculator has determined this bug's performance priority to be P1.
Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Impact on site: Causes noticeable jank
Configuration: Rare
Page load impact: Some
[x] Affects major website
[x] Able to reproduce locally
[x] Bug affects multiple sites
Updated•2 years ago
|
Comment 18•2 years ago
|
||
The severity field for this bug is set to S3. However, the Performance Impact
field flags this bug as having a high impact on the performance.
:dholbert, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact
flag to ?
?
For more information, please visit auto_nag documentation.
Comment 19•2 years ago
•
|
||
Replying to comment 17 comment 18: In terms of prioritization, I'm not sure the Affects major website
/ top50
assessment is accurate. I would guess/hope that this is rare in the wild.
SAP is itself a prominent web property, and this bug was reported for a SAP-hosted web application, but importantly there's no indication that this bug affects any substantial portion of SAP applications in general. (If it did, that would be important to know & would be quite concerning, and a severity-bump would totally be merited.)
This perf issue would still be great to fix, but it probably won't happen right away; the fix won't be trivial and will likely include some substantial risk of breakage (and hence will need careful thought/testing/vetting), given this is a long-standing inefficiency with deeply nested <table>
elements, which are a pretty-old/crufty part of our codebase.
I'll take the bot's suggestion to reset Performance Priority Calculator
to ?
(and I'll remove the top50
keyword). On the Performance Priority Calculator, I think the appropriate Websites affected:
value is Rare
(though Configuration
should be General
, since this doesn't require any special configuration on the client side). This lowers the perf impact assessment a bit, to "low" when I click through the calculator tool. (If we have signals that this is actually a common issue, I'm of course happy to be corrected on this.)
Comment 20•2 years ago
|
||
Thanks @dholbert, we've recalculated the impact as medium.
The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.
Platforms: Windows
Impact on site: Causes noticeable jank
Page load impact: Some
Websites affected: Rare
[x] Able to reproduce locally
Description
•