Closed
Bug 1273024
Opened 9 years ago
Closed 9 years ago
[e10s] loading https://500px.com/popular and clicking a photo causes Nightly to hang/use large amount of memory, round 2
Categories
(Core :: Layout, defect)
Tracking
()
RESOLVED
FIXED
mozilla49
People
(Reporter: Gavin, Assigned: kats)
References
Details
(Keywords: crash, Whiteboard: [MemShrink])
Attachments
(3 files, 1 obsolete file)
13.18 KB,
text/plain
|
Details | |
127.85 KB,
patch
|
Details | Diff | Splinter Review | |
959 bytes,
patch
|
mattwoodrow
:
review+
ritu
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1251527 +++
This bug has reared it's ugly head again.
STR:
1) Start Nightly (I'm currently testing the May 15 build, built off mozilla-central 4a8ed77f6bb573f20980056bf8c1dadd125c1a85) with a new profile
2) load https://500px.com/popular
3) click one of the photos to "zoom in"
Nightly hangs. At first just the content process is non-responsive (switching tabs results in content spinner), but eventually the whole app gets unresponsive too. This has also caused OS X to freak out and pop up the "System out of memory" dialog, which in one case caused all my other open apps to be paused and hang and required a reboot.
Reporter | ||
Comment 1•9 years ago
|
||
I'm not sure whether this is a regression or if the previous bug was just never fixed completely. I tried looking for a regression range but ran into issues with this only be intermittently reproducible :(
No longer depends on: 1251527
Reporter | ||
Comment 2•9 years ago
|
||
I can reproduce this on the latest beta (FIREFOX_47_0b5_BUILD1) and developer edition (f00f9e754ce71fc5efa3e4334afea5cf6ad6c215), but not on release.
status-firefox46:
--- → unaffected
status-firefox47:
--- → affected
status-firefox48:
--- → affected
tracking-firefox47:
--- → ?
tracking-firefox48:
--- → ?
tracking-firefox49:
--- → ?
Keywords: regressionwindow-wanted
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:49.0) Gecko/20100101 Firefox/49.0
Build ID: 20160516030211
I have tested your issue on latest Nightly build (20160516030211) and managed to reproduce it, but only if e10s is enabled. I have opened the link you provided and when clicking on a image to enlarge it, Nightly hangs. I've also performed a regression and below are my results:
Last good revision: 12b082de19be6a96d0f834771fb81dbdd4840721
First bad revision: 664f46e3d8845b5e9e5b78608ddd867ba072237c
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=12b082de19be6a96d0f834771fb81dbdd4840721&tochange=664f46e3d8845b5e9e5b78608ddd867ba072237c
Looks like the following bug has the changes which introduced the regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=1111386
tracking-e10s:
--- → ?
Keywords: regressionwindow-wanted
Comment 4•9 years ago
|
||
Tracking for 48/49 based on the bad user experience and the fact that it can make the entire system unresponsive.
Comment 5•9 years ago
|
||
Paul, can you re-run your regression finding? Bug 1111386 sounds really unlikely.
Flags: needinfo?(paul.pasca)
I did perform another regression on this issue and I've obtained different results. I don't know if this is relevant, but this time I've used OS X 10.11 and last time it was OS X 10.10.
Last good revision: c525c5164cdde373a636d525e53c77f52d8fd693
First bad revision: c99e8abb5e00eaa439a3be7864df0e4135326349
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c525c5164cdde373a636d525e53c77f52d8fd693&tochange=c99e8abb5e00eaa439a3be7864df0e4135326349
Flags: needinfo?(paul.pasca)
![]() |
||
Comment 7•9 years ago
|
||
I can reproduce:
1) open most popular photo page
2) click on any photo which should load a photo specific web page
result: beach balled nightly for about 15 seconds (old mac air). Once the photo page loads it responds normally.
3) click back button to the most popular page
4) click on the photo again
result: another long hang, then the page loads
![]() |
||
Updated•9 years ago
|
Assignee: nobody → gkrizsanits
Comment 8•9 years ago
|
||
Sorry Paul but I have to ask you to do this a third time. The regression range you found now is the one during which the original bug (bug 1251527, referenced in comment 0) was caused. But that regression was fixed on March 9th, 2016. Can you re-run your regression check with a starting window that starts after March 9th?
Flags: needinfo?(paul.pasca)
Comment 9•9 years ago
|
||
Not sure if this profiler output is extremely helpful but I link it anyway, it reflects the steps described in Comment 7. https://cleopatra.io/#report=dd18c637dee1c72d04b3ddfb60a0889458d90cde
In short if I turn APZ off this problem is gone.
This looks really bad on the latest MBP retina as well. The whole browser becomes unresponsive, so I wonder why the profiler does not show huge chrome process janks.
Assignee: gkrizsanits → bugmail.mozilla
Flags: needinfo?(bugmail.mozilla)
(In reply to Markus Stange [:mstange] from comment #8)
> Sorry Paul but I have to ask you to do this a third time. The regression
> range you found now is the one during which the original bug (bug 1251527,
> referenced in comment 0) was caused. But that regression was fixed on March
> 9th, 2016. Can you re-run your regression check with a starting window that
> starts after March 9th?
Sorry, but the build from 10 March 2016 is bad. If I perform a regression starting from 10 March 2016, I get an error that says that the build was expected to be good.
Flags: needinfo?(paul.pasca)
![]() |
||
Updated•9 years ago
|
Blocks: apz-desktop
Assignee | ||
Comment 11•9 years ago
|
||
So when I follow the STR (local m-c build) I definitely see slowness and it takes a while for the photo to show up, but it does work and doesn't crash. I see some interesting things in the console output such as:
[Child 23831] WARNING: Failed to create DrawTarget of sufficient size.: file /Users/kats/zspace/mozilla-git/gfx/thebes/gfxContext.cpp, line 1259
I added more logging to this, and the size being requested is 1436 x 999999 which seems rather large indeed. It might be that if you have a system with enough RAM (>5G ?) it actually tries to allocate that buffer, eating up all the RAM and then some. On my machine I guess it fails and goes into the fallback path at [1].
The 999999 is coming from a fixed-position div in the lightbox that pops up when you click on a photo. I'm also attaching a layers dump, which shows a ContainerLayer and a ColorLayer with that height.
[1] http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxContext.cpp?rev=c992422247b7#1258
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 12•9 years ago
|
||
If I use the devtools to shrink that fixed-pos element to 1000px instead of 999999px tall the page responds much better, so I'm fairly confident that's the cause of the slowness. We are probably not clamping that height somewhere that we should be. It has an opacity of 0.97 so it's going through some group opacity code - stacktrace attached to the point of the allocation failure, if that's of any use.
Assignee | ||
Comment 13•9 years ago
|
||
This is more useful. The app unit height is 59999940, grepping for that in the file gives you the relevant display items.
Attachment #8754970 -
Attachment is obsolete: true
Assignee | ||
Comment 14•9 years ago
|
||
Based on the patch in bug 1251527, I tried commenting out the two lines at [1] and that fixed the problem. But I don't know if that's the right fix - Matt, do you want to take this?
[1] http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp?rev=34eda1b91ca3&mark=4916-4917#4916
Assignee: bugmail.mozilla → nobody
Flags: needinfo?(matt.woodrow)
![]() |
||
Updated•9 years ago
|
No longer blocks: apz-desktop
Comment 15•9 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #14)
> Based on the patch in bug 1251527, I tried commenting out the two lines at
> [1] and that fixed the problem. But I don't know if that's the right fix -
> Matt, do you want to take this?
>
> [1]
> http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.
> cpp?rev=34eda1b91ca3&mark=4916-4917#4916
That seems right, setting the visible region to the entirety of the bounds is unlikely to ever be what we want.
We should see if this passes try.
Flags: needinfo?(matt.woodrow)
Comment 16•9 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #15)
> We should see if this passes try.
let's see:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec1584aff2da
This seems to be an e10s only issue. As Fx47 will ship with e10s off, this is now a wontfix. Please let me know if this also affects non-e10s users.
Comment 18•9 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #16)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec1584aff2da
Very noisy run on windows, but all the oranges seem to be unrelated intermittents, but it's hard to tell for sure... locally I could not reproduce any of the failures but I'm on OSX. I would risk a landing attempt...
Kats, do you want to file the patch for review and land it? I don't want to steal the bug from you, just had a few extra cycles to test it for you...
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #18)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #16)
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec1584aff2da
>
> Very noisy run on windows, but all the oranges seem to be unrelated
> intermittents, but it's hard to tell for sure... locally I could not
> reproduce any of the failures but I'm on OSX. I would risk a landing
> attempt...
Yeah, I was seeing those same windows oranges on other pushes, I think something to do with the clipboard is busted on windows try. Pretty sure it's unrelated to the patch.
> Kats, do you want to file the patch for review and land it? I don't want to
> steal the bug from you, just had a few extra cycles to test it for you...
Yup, I'll put the patch up for review. Thanks for doing the try push!
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 20•9 years ago
|
||
Assignee: nobody → bugmail.mozilla
Attachment #8755873 -
Flags: review?(matt.woodrow)
Updated•9 years ago
|
Attachment #8755873 -
Flags: review?(matt.woodrow) → review+
Comment 21•9 years ago
|
||
Comment 22•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Assignee | ||
Comment 23•9 years ago
|
||
Comment on attachment 8755873 [details] [diff] [review]
Patch
Approval Request Comment
[Feature/regressing bug #]: bug 1231538
[User impact if declined]: some pages use up a ton of memory or are slow to respond because they are eating too much memory
[Describe test coverage new/current, TreeHerder]: not really :/
[Risks and why]: low risk, the code is well understood and the bug appears to have been mostly an oversight.
[String/UUID change made/needed]: none
Attachment #8755873 -
Flags: approval-mozilla-aurora?
Comment on attachment 8755873 [details] [diff] [review]
Patch
Reduces e10s memory usage, Aurora48+, has been in Nightly for a few days.
Attachment #8755873 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 25•9 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•