Closed Bug 1263083 Opened 4 years ago Closed 4 years ago

Google Inbox is *incredibly* slow after March 7 nightly build.

Categories

(Core :: Graphics: Layers, defect)

48 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- unaffected
firefox48 + fixed
firefox49 + fixed
firefox50 + fixed

People

(Reporter: dietrich, Assigned: jnicol)

References

Details

(Keywords: perf, regression)

Attachments

(1 file)

STR:

1. Log in to inbox.google.com
2. Do *anything* in the message list (actions below)

Actions:

* use keyboard shortcuts to move up and down the message list
* click on a category to open the list of messages
* click on a message header to open the email

Expected: Actions take less than or up to 1 second to complete. This is what happens with the Mar 7 build.

Actual: Actions take between 3 and 10 seconds.

Environment: Mac. Tried both dirty and new profiles.
Hi David, dcamp pointed me your way for Firefox perf issues these days. Can you route this into the right hands?

We should try to not ship this regression, and the clock is running on that, so appreciate any help.
Flags: needinfo?(ddurst)
Any new /updated plugins  ?


Mac  OSX ## ?
Default plugins for Mar 7 nightly and latest nightly on new profile should be the same, right?

Mac OS X 10.11.3 (15D21).
Any chance you can get a narrower regression window with mozregression ? You can use the --profile switch to run it against your usual profile.
Component: General → Untriaged
Flags: needinfo?(dietrich)
I've been experiencing this too, but recent (last 2-3 days) Nightlies actually don't even load Inbox completely.
I filed bug 1263170 for the 'failed to load' issue, which appears in Beta too so it's probably an Inbox change.
I don't reproduce this on my Linux machine so it might be Mac-specific. Are you in e10s mode or not? If you are then can you try running in non-e10s mode and see if the problem goes away?
Hey Gij, sorry I cannot narrow the window right now. I added the keywords so hopefully someone will be able to.
Flags: needinfo?(dietrich)
James, I've not seen bug 1263170 at all. Things load always for me, just takes a lonnggggg time.

Gabriele, yeah testing w/ e10s enabled since that is default in a new profile.

With e10s off: Super slow initial load, including brief beachballing, but otherwise didn't exhibit the bug.
Product: Firefox → Core
Hi,

I have tested this issue on Mac OS X 10.10 with FF Nightly 48.0a1 (2016-04-10) and I can't reproduce the issue. Can you please retest this with the latest build?
(In reply to Dietrich Ayala (:dietrich) from comment #9)
> With e10s off: Super slow initial load, including brief beachballing, but
> otherwise didn't exhibit the bug.

I've seen this once last week, very slow load but then everything is fine but can't reproduce it anymore.
Everyone: It's great that it works for you, but the goal is to figure out why it's *not* working for me.

I've been working on and testing Firefox for over 10 years, so asking me to update to latest build and test it again is not productive - I've been living on nightlies forever ;)

In the event that the issue *is* fixed by updating to a recent build, then we *still* need to know what regressed the behavior and also what fixed it.

Please before commenting that it works for you, test the exact scenario I reported, and also compare the Mar 7 build vs most recent nightlies. I'm on the latest version of Mac OS X El Capitan on a new Macbook (the tiny ones) - which may or may not be relevant, so make a note of the hardware you're testing on also.

Examples of commands that are excruciatingly slow for me are "o" to open categories, "e" to mark an item as done, and "z" to undo. If I go back to Mar 7 nightly, everything is much much faster.
Dietrich -- so does this only happen to you on the March 7 build? In case we are reducing to a specific build (and possibly platform), I'd like to target accordingly.
Flags: needinfo?(ddurst)
Heh, exactly the opposite. Inbox performs normally on Mar 7 build, and later builds perform terribly. I don't know that the problem started on the Mar 8 build though - all I could find that was older at the time I reported was Mar 7, which works fine so I stuck with it ;)

We really just need someone from QA who's got the mozregression stuff installed and dialed to try and reproduce and narrow the window of regression.
So I just checked this on my local (Mac (late 2013 15", though), OSX 10.11.3, current Nightly) and I can't reproduce it either.

If we're going to bisect, we can start with bisecting the nightlies from Mar 7 and Mar 8. I can try replicating on the older Nightly but ymmv -- I'm brand new to this and still ramping in. Someone may find the issue via bisection faster than I'll have a new profile set up...
Tested again on MacBook Pro (Retina, 15-inch, late 2013) OS X 10.11 with the latest Nightly (2016-04-11) and I can't reproduce this issue.
(In reply to Dietrich Ayala (:dietrich) from comment #14)
> Heh, exactly the opposite. Inbox performs normally on Mar 7 build, and later
> builds perform terribly. I don't know that the problem started on the Mar 8
> build though - all I could find that was older at the time I reported was
> Mar 7, which works fine so I stuck with it ;)
> 
> We really just need someone from QA who's got the mozregression stuff
> installed and dialed to try and reproduce and narrow the window of
> regression.

Considering they can't reproduce, any chance you can try using ./mach mozregression yourself to reproduce the issue on your local machine and get an inbound branch (fx-team, m-i, b2g-i or whatever) regression window? It seems like having such a window might also shed some light on why others aren't reproducing...
Flags: needinfo?(dietrich)
Sorry, but I've been down this rabbit hole before, and I can't chase Firefox regressions *and* do my day job. It's both incredibly time consuming and depressing.

I hope filing this bug will help in some future investigation. Good luck and thanks in advance to future spelunkers.

We used to have staff QA and a community of testers that they'd grown, that had their environments set up for this and had the tools dialed in, but afaict we don't have either anymore?
Flags: needinfo?(dietrich)
Closing as WFM, since nobody else can reproduce.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
(In reply to Dietrich Ayala (:dietrich) from comment #18)
> We used to have staff QA and a community of testers that they'd grown, that
> had their environments set up for this and had the tools dialed in, but
> afaict we don't have either anymore?

We still do, and some of them have commented here...
I can definitely still repro this. I looked into it a bit and it seems like layerization is going crazy. My guess is that we end up putting every message into its own layer. Matt/Jamie can one of you look?
Status: RESOLVED → REOPENED
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jnicol)
Resolution: WORKSFORME → ---
Component: Untriaged → Graphics: Layers
I'm on the case
Assignee: nobody → jnicol
Flags: needinfo?(jnicol)
I cannot personally reproduce the noticeable performance regression, each of these operations always takes about 1 second for me. I did however notice a change in how it layerizes on the 9th. Dietrich, is it correct that the 7th is the last good build *you aware of*, but that doesn't mean the 8th is necessarily the first bad build? That's how I read the above comments, and is consistent with my findings.

The change I notice is that we are using more multi-tiled layers than before. This would explain why it only affects OS X, although even with tiling enabled on linux I cannot reproduce a noticeable performance regression. I ran a bisection looking out for the extra multi-tiled layers and found bug 1221094 to be the culprit, which makes sense. Memory usage after that commit is greater than before. Each of the operations listed as slow will now result in more memory allocations required to perform them. (Perhaps on a hi-dpi display this would be even more pronounced and therefore cause a more noticeable slow down.)

The commit in question usually causes positive improvements in my experience. But I think the combination of it along with some existing bad decisions on this page is what's causing problems. There should be a way to layerize this page better, I'll keep investigating.
Sorry for the brain dump, but here's another. Each email is given its own layer. Previously these were single-tiled layers, so only use memory for the size of the layer. But since bug 1221094 we use mutli-tiled layers for them. So each of these short and wide layers are using tiles that are far too tall for them, hence the massive increase in memory usage. Bug 1243589 made it so that we don't use multi-tiled layers when the layers are smaller than the tile size, but only when both the width AND height are smaller, whereas these layers are short but wide.

We need to update that logic to consider width and height separately. I'm not sure what the correct heuristic would be, but these definitely shouldn't be being multi-tiled. That would fix this regression.

But I also want to find out why each email is getting its own layer in the first place. I'd guess it's something to do with them being transformed, but ideally they'd all be in the one large multi-tiled layer, at least until you hover over one and it gets animated.
Thanks, Jamie. I am on a 4k display and the window is sized to be the entire height and half the width of the display. After your comment, I found that reducing the width substantially improved performance. This likely made each mail fit into one tile, so I think you are on the right track.
Version: unspecified → 48 Branch
The reason each email has its own layer is because they are marked as will-change. That is the right thing to do. But we should still be single-tiling instead of multi-tiling, so the above patch makes that the case.
Flags: needinfo?(matt.woodrow)
Comment on attachment 8765412 [details]
Bug 1263083 - Use single-tile layer when less than half the tile size in either dimension;

https://reviewboard.mozilla.org/r/60804/#review57906

The heuristic sounds good, but please factor out the condition into a separate variable and add a comment. Something like "don't waste more than 50% of the tile's pixels in either direction".
Attachment #8765412 - Flags: review?(mstange) → review+
Having a preferences for the magical factor would be an overkill?
(In reply to Milan Sreckovic [:milan] from comment #29)
> Having a preferences for the magical factor would be an overkill?

I would say so, but if Markus or you strongly disagrees I'm happy to add it.
Comment on attachment 8765412 [details]
Bug 1263083 - Use single-tile layer when less than half the tile size in either dimension;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/60804/diff/1-2/
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/81f94d11a924
Use single-tile layer when less than half the tile size in either dimension; r=mstange
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/81f94d11a924
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Nightly is *dramatically* better now with Inbox. Thanks. I recommend we uplift this to 48 if you are comfortable.
Flags: needinfo?(jnicol)
Comment on attachment 8765412 [details]
Bug 1263083 - Use single-tile layer when less than half the tile size in either dimension;

I'd uplift this to aurora no questions asked, possibly beta but we are late in the cycle?

Approval Request Comment
[Feature/regressing bug #]: bug 1221094
[User impact if declined]: Larger memory allocation, resulting in some websites (like google inbox) being unusably slow.
[Describe test coverage new/current, TreeHerder]: Been on central for almost a week.
[Risks and why]: Lowish. The change is straightforward. Could have a potential negative effect on websites with extremely long thin layers, but these will be rare and probably don't work great anyway.
[String/UUID change made/needed]: N/A
Flags: needinfo?(jnicol)
Attachment #8765412 - Flags: approval-mozilla-beta?
Attachment #8765412 - Flags: approval-mozilla-aurora?
Not late - 48 happens to be an eight week beta, we have until the end of July.
Comment on attachment 8765412 [details]
Bug 1263083 - Use single-tile layer when less than half the tile size in either dimension;

This patch fixes a performance regression. Take it in 48 beta 6 and aurora.
Attachment #8765412 - Flags: approval-mozilla-beta?
Attachment #8765412 - Flags: approval-mozilla-beta+
Attachment #8765412 - Flags: approval-mozilla-aurora?
Attachment #8765412 - Flags: approval-mozilla-aurora+
Track this as risk mentioned in comment #36.
I could not reproduce this bug using Fx 48.0a1 build ID:20160407062903 and ID:20160309030419, on OS X 10.11.1 (tested with both e10s enabled and disabled).

Can you please confirm if it is still reproducible on your platform?
Flags: qe-verify+ → needinfo?(jnicol)
I could never reproduce the slowness on my platform (Ubuntu), but I can confirm that the patch fixes what I believe was the cause of the slowness on OS X.
Flags: needinfo?(jnicol)
Depends on: 1290149
You need to log in before you can comment on or make changes to this bug.