Open Bug 1659166 Opened 4 years ago Updated 11 months ago

Very long reflows on Wikipedia Barack Obama page

Categories

(Core :: Layout: Columns, defect)

defect

Tracking

()

Performance Impact medium

People

(Reporter: bas.schouten, Unassigned)

References

()

Details

(5 keywords)

Attachments

(2 files)

I wonder if this is a regression from stuff like bug 1647332.

Status: ASSIGNED → NEW

[Tracking Requested - why for this release]: Severe perf regression in some Wikipedia pages.

Tentatively moving to Layout: Columns, given Bas says this is a regression from 78, and that matches the time with bug 1647332.

Regressed by: 1647332
Keywords: regression

yeah, this is probably an edge case that may happen in some but not all pages or such.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)

yeah, this is probably an edge case that may happen in some but not all pages or such.

We suspect this may also be related to the way we capture and replay. Most wikipedia pages I looked at real quick (to be fare, n=3) were showing disproportionately large reflow times when compared to the overall page complexity.

Bug 1658198 Comment 22 has an explanation of why a page having a column container is slow. However, I don't see wikipedia page has font-size: 0 or line-height: 0 though.

Bas, could you try the build with my patch in 1658198 applied, and see if the performance is improved?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62b2260c92fb2708e09cc1c35df4d8f56de7a40b

In general, wikipedia pages that have a large "Notes and references" section (like Obama's page) can be expensive to find the best column balancing height, and it can still take several reflow iterations after the effort in bug 1647332 and bug 1647520. bug 575614 also has some discussion regarding the performance of multi-column layout.

Severity: -- → S3
Component: Layout: Block and Inline → Layout: Columns

NI for comment 6.

Flags: needinfo?(bas)

Not really. The column display on the left with the languages still has a 700ms delay to display vs Chrome with that build. Fwiw, I was able to reproduce this across 4 different machines and across release, beta and nightly :). So it should be easy to verify.

Flags: needinfo?(bas)

To make matters worse, in some cases this reflow happens before first paint. See a profile from release here: https://share.firefox.dev/3axrPK7

We're building the 80 release candidate today; I can track this, but won't block on it.

I'm trying to use mozrgression to capture a profile on my end. I'm looking at the second (longest) nsColumnSetFrame::Reflow in both profiles in the "Flame graph" tab.

  • Firefox 78 (2020-05-15) https://share.firefox.dev/3kTfcO9. The first and second nsColumnSetFrame::Reflow takes 19ms and 592ms.
  • Firefox 81 (2020-08-16) https://share.firefox.dev/348C9XN. The first and second nsColumnSetFrame::Reflow takes 64ms and 319ms. (Not sure why the green posix_fallocate in graphic category take near 1 minute on current nighty)

I profiled both build a few times, and the longest nsColumnSetFrame::Reflow take roughly the same amount of time each time, about 5xx ms (2020-05-15) and 3xx ms (2020-08-16). I'm sure compared to Chrome, Firefox can be slower to layout multicol, but I'm skeptical that the slowness is because of bug 1647332.

Bas, could you help take a look at the profiles I captured, and see if I misinterpreted the data?

Flags: needinfo?(bas)

(In reply to Ting-Yu Lin [:TYLin] (UTC-7) from comment #11)

I'm trying to use mozrgression to capture a profile on my end. I'm looking at the second (longest) nsColumnSetFrame::Reflow in both profiles in the "Flame graph" tab.

  • Firefox 78 (2020-05-15) https://share.firefox.dev/3kTfcO9. The first and second nsColumnSetFrame::Reflow takes 19ms and 592ms.
  • Firefox 81 (2020-08-16) https://share.firefox.dev/348C9XN. The first and second nsColumnSetFrame::Reflow takes 64ms and 319ms. (Not sure why the green posix_fallocate in graphic category take near 1 minute on current nighty)

I profiled both build a few times, and the longest nsColumnSetFrame::Reflow take roughly the same amount of time each time, about 5xx ms (2020-05-15) and 3xx ms (2020-08-16). I'm sure compared to Chrome, Firefox can be slower to layout multicol, but I'm skeptical that the slowness is because of bug 1647332.

Bas, could you help take a look at the profiles I captured, and see if I misinterpreted the data?

What's up with that 1s rasterize on the second profile?

Mind you, I was looking at Windows, which doesn't show these weirdly long reflows. It's pretty hard to compare your first and second profile, but most certainly both have really long reflows (I didn't see a really long reflow in 78, but I only ran 78 a couple of times, it could be a coincidence). Although the 78 reflow on a whole is 'considerably faster' than the 81 reflow. (n=1, and we appear to be seeing different frames in both profiles so this could be a coincidence)

As for bug 1647332, I'm not claiming it's necessarily related to this :). That was Emilio's first guess I think. But this reflow duration is most definitely problematic. I don't know enough about layout to make a guess as to another cause.

Flags: needinfo?(bas)

Thanks Bas.

I just try profiling on macOS, and I don't see the weirdly long paint of posix_fallocate.

Yeah, I agree the long reflow of Wikipedia pages is a known problem because of our currently implementation to find the best column block-size. That motivates me to implement heycam's ideas to make it faster in bug 1647332, although we still have plenty of room to improve it.

As for bug 1647332, I'm not claiming it's necessarily related to this :). That was Emilio's first guess I think. But this reflow duration is most definitely problematic. I don't know enough about layout to make a guess as to another cause.

Per the above, I'll remove the tracking flags and bug 1647332 from the "Regressed by" field.

Pretty sure Bas tested this on an earlier build (78-ish) and didn't see the same issue on his Windows machine. It's probably worth mozregression'ing this to see what made this worse, given the size of the issue on Windows. I don't know if Wikipedia serves different markup/CSS depending on UA and that somehow affects this, but either way it's probably worth getting a better idea of what's going on here.

Firefox 81 (2020-08-16) https://share.firefox.dev/348C9XN. The first and second nsColumnSetFrame::Reflow takes 64ms and 319ms. (Not sure why the green posix_fallocate in graphic category take near 1 minute on current nighty)

FYI, the posix_fallocate shown on the profiler is bug 1658847.

Keywords: regression

Hi, I'm happy to help performing a mozregression, can I get some STR in order to do so?

Thanks!
Best,
Clara.

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Clara Guerrero from comment #16)

Hi, I'm happy to help performing a mozregression, can I get some STR in order to do so?

I think Bas is best placed to give STR here.

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(bas)
QA Whiteboard: [qa-regression-triage]

(In reply to :Gijs (he/him) from comment #17)

(In reply to Clara Guerrero from comment #16)

Hi, I'm happy to help performing a mozregression, can I get some STR in order to do so?

I think Bas is best placed to give STR here.

Basically just open the Obama wikipedia page, in the 'bad' case the page loads slower (in particular the column on the left with the languages, for example) comes in later. Fwiw this seems to be somehow load order dependent or something along those lines, as it doesn't appear to 'always' happen, it sort of comes and goes. Which means it's possible I just got lucky when testing on an older version.

Flags: needinfo?(bas)
Attached video left column(1).webm

Can you please confirm the first attempt reflects the issue, and second attempt loads fine?
Best,
Clara

Flags: needinfo?(bas)

(In reply to Clara Guerrero from comment #19)

Created attachment 9176388 [details]
left column(1).webm

Can you please confirm the first attempt reflects the issue, and second attempt loads fine?
Best,
Clara

Indeed, that's what it looked like for me!

Flags: needinfo?(bas)
Attached video obama website.webm

So, I'm trying to get a regression range but I noticed that chrome Version 85.0.4183.121 (Official Build) (64-bit) is also behaving as shown in my video, (Internet explorer version 11.1082.18362.0 and Microsoft Edge 44.18362.449.0 won't show this behaviour though). Please confirm if it's a good idea for me try to obtain a range with mozregression.
Best,
Clara

Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(aethanyc)

I think if you can find a build that has the "good" behaviour from comment 20 then it may be worth running mozregression.

Flags: needinfo?(gijskruitbosch+bugs)

Sorry for being late to reply. I agree with Gijs. If we were perform better but end up being like other browsers, it still helpful to identify the regressor.

Flags: needinfo?(aethanyc)

Bas - is this still an issue?

Flags: needinfo?(bas)

Yes, we are still seeing very slow reflows on this site

This one includes a 749ms reflow on a cold page load:
https://share.firefox.dev/3QFlnrc

This looks to be one of the reason why our performance relative to Chrome not good on this site: https://faraday.basschouten.com/mozilla/Pageload/details.html?os=linux

This try push includes results and profiles for all platforms:
https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=LeXlSApURpOSPaBkqrz3nQ.0&tier=1%2C2%2C3&revision=354e1b0b7958c8815fd284f3861a534bb0b051eb

Flags: needinfo?(bas)

The Performance Impact Calculator has determined this bug's performance impact to be high. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] Linux
Page load impact: Some
Websites affected: Major
[x] Able to reproduce locally

Performance Impact: --- → high

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:TYLin, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit BugBot documentation.

Flags: needinfo?(aethanyc)

(In reply to Ting-Yu Lin [:TYLin] (UTC-8) from comment #6)

Bug 1658198 Comment 22 has an explanation of why a page having a column container is slow. However, I don't see wikipedia page has font-size: 0 or line-height: 0 though.

Bas, could you try the build with my patch in 1658198 applied, and see if the performance is improved?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62b2260c92fb2708e09cc1c35df4d8f56de7a40b

In general, wikipedia pages that have a large "Notes and references" section (like Obama's page) can be expensive to find the best column balancing height, and it can still take several reflow iterations after the effort in bug 1647332 and bug 1647520. bug 575614 also has some discussion regarding the performance of multi-column layout.

Would this patch still be potentially helpful?

Re comment 25:

In the profile, I see we spend a lot of time in nsColumnSetFrame::FindBestBalanceBSize. I've improved the column balancing performance a bit a few years ago, and I don't have other idea to further improve it this moment.

Re comment 26:

Performance Impact Calculator deems this bug's performance impact to be high because Wikipedia is a major site. However, not all the Wikipedia pages has hundreds of list items in the References section like Barack Obama's page. Spending ~1 second of reflow time in column balancing is bad, but it is not bad enough to become a noticeable jank or delay imho.

Andrew, do you feel the performance impact is still high per my explanation above?

Re comment 28:

The patch in the try run has been landed in Bug 1658198.

Flags: needinfo?(aethanyc) → needinfo?(acreskey)

(In reply to Ting-Yu Lin [:TYLin] (UTC-8) from comment #29)

Re comment 25:

In the profile, I see we spend a lot of time in nsColumnSetFrame::FindBestBalanceBSize. I've improved the column balancing performance a bit a few years ago, and I don't have other idea to further improve it this moment.

Re comment 26:

Performance Impact Calculator deems this bug's performance impact to be high because Wikipedia is a major site. However, not all the Wikipedia pages has hundreds of list items in the References section like Barack Obama's page. Spending ~1 second of reflow time in column balancing is bad, but it is not bad enough to become a noticeable jank or delay imho.

Andrew, do you feel the performance impact is still high per my explanation above?

Yes, that's a fair point, not all of wikipedia.org exhibits this performance discrepancy.
I've re-run it and it comes in at medium.

Because this particular page is part of our pageload tests we see the results very frequently.

Performance Impact: high → medium
Flags: needinfo?(acreskey)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: