Closed
Bug 1100253
Opened 10 years ago
Closed 10 years ago
~40M AWSY regression after bug 1084136
Categories
(Core :: Graphics: ImageLib, defect)
Core
Graphics: ImageLib
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox35 | --- | unaffected |
firefox36 | + | fixed |
firefox37 | + | fixed |
People
(Reporter: Swatinem, Assigned: seth)
References
(Blocks 1 open bug)
Details
(Keywords: regression, Whiteboard: [MemShrink:P1][awsy+][awsy-])
There is a huge memory spike on November 7th.
According to the pushlog, that’s when e10s got enabled by default on nightly.
Is AWSY running and measuring that?
If so, I would argue that the memory usage include is to be expected.
Comment 1•10 years ago
|
||
It's definitely e10s. The pushlog for the jump is
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e228bf3b5f02&tochange=c4b831696f15, which includes 1093691, which enabled e10s.
The next step is to work out exactly what we're now measuring (I think it's the sum of the two processes, e.g. "resident" is "resident from process 1" + "resident from process 2") and whether that still makes sense. Measuring separate processes separately is probably better.
Whiteboard: [MemShrink]
Updated•10 years ago
|
tracking-e10s:
--- → ?
Summary: ~60M AWSY regression on November 7 → e10s: ~60M AWSY regression on November 7
Comment 2•10 years ago
|
||
Does the AWSY platform (linux?) have the shared page stuff from b2g? I assume we're not trying to use NUWA on desktop yet, right?
Comment 3•10 years ago
|
||
No NUWA, (after searching what it is.) e10s desktop on *nix is fork+exec. Defaults to a single content-process, (that is a separate executable.) Multiple content processes run well with very few problems if preference set.
A blank page content-process takes ~70m rss. Expect a large chunk could be saved using fork. I've been tempted to make a bug/feature request as I haven't spotted one suggesting it (but I've been running fine as is.)
Comment 4•10 years ago
|
||
In order to fix AWSY after e10s landed I had to switch to mozmill 2.0, my understanding is this actually disables e10s. So in this case I don't think it's actually a reporting problem (ie RSS + RSS vs RSS + USS).
This could indicate a couple of things:
- e10s is causing high memory usage even when disabled
- mozmill 2.0 is using more memory than mozmill 1.5
- there's still an extra content process even thought e10s is disabled
Comment 5•10 years ago
|
||
I can confirm there's only one process w/ the e10s pref disabled, so it's definitely not AWSY numbers being skewed by multiple processes.
Comment 6•10 years ago
|
||
I've backported the pref changes from mozmill-2.0 to our 1.5 version of AWSY, I'll restart our server w/ the changes and we can see if the issue is mozmill-2.0 related.
Comment 7•10 years ago
|
||
Reverted to mozmill-1.5 and reran overnight, numbers look better:
Still elevated, but not by as much:
RSS: After TP5
RSS: After TP5 [+30s]
RSS: After TP5 [+30s, forced GC]
Back to normal (more or less):
RSS: After TP5, tabs closed
RSS: After TP5, tabs closed [+30s]
RSS: After TP5, tabs closed [+30s, forced GC]
Never really changed:
RSS: Fresh start
RSS: Fresh start [+30s]
So we can see that mozmill-2.0 had an impact, but there's still a regression. Whether or not it was e10s is hard to say for sure at this point. I've queued a rerun of the Nov 7 - Nov 17 timerange, but that's going to take a while.
Where can I find the actual data?
Flags: needinfo?(erahm)
Comment 9•10 years ago
|
||
The data isn't super reliable right now, but the values for the past day should have eliminated 2.0 from the mix: https://areweslimyet.com/
Zoom in to the past few weeks. Each node can have 1 or more changesets associated w/ it. You can click a [view] next to changeset you're interested in and dig into the actual memory numbers. Each iteration also has an [export] option that'll give you an about:memory compatible report.
Flags: needinfo?(erahm)
Comment 10•10 years ago
|
||
This looks like a 40MB regression on explicit for the "After TP5" categories. The primary offenders are "cache/window" entries. Below or the results of diffing an 11/6 result w/ an 11/18 result.
High-level:
> 40.46 MB (100.0%) -- explicit
> ├──31.79 MB (78.58%) ++ window-objects
> ├───3.97 MB (09.82%) ── heap-unclassified
> ├───3.55 MB (08.78%) ++ images
So that's *mostly* window-objects, lets look at a snippet under there:
> ├──31.79 MB (78.58%) -- window-objects
> │ ├───8.18 MB (20.23%) -- top(http://localhost:8088/tp5/icious.com/www.delicious.com/index.html, id=2484)
> │ │ ├──8.14 MB (20.11%) ++ cached/window(http://localhost:8058/tp5/dailymail.co.uk/www.dailymail.co.uk/ushome/index.html)
> │ │ └──0.05 MB (00.12%) ++ (2 tiny)
> │ ├───3.67 MB (09.06%) -- top(http://localhost:8083/tp5/etsy.com/www.etsy.com/category/geekery/videogame.html, id=2459)
> │ │ ├──3.73 MB (09.23%) ++ cached/window(http://localhost:8053/tp5/douban.com/www.douban.com/index.html)
> │ │ └──-0.07 MB (-0.17%) ++ (2 tiny)
> │ ├───3.33 MB (08.24%) -- top(http://localhost:8090/tp5/web.de/web.de/index.html, id=2490)
> │ │ ├──3.30 MB (08.15%) ++ cached/window(http://localhost:8060/tp5/indiatimes.com/www.indiatimes.com/index.html)
> │ │ └──0.04 MB (00.09%) ++ (2 tiny)
> │ ├───3.30 MB (08.15%) -- (41 tiny)
> │ │ ├──0.38 MB (00.95%) -- top(http://localhost:8056/tp5/digg.com/dads.new.digg.com/view.html@kw=zone%253A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%253Apermalink&template=5.html, id=2566)
> │ │ │ ├──0.38 MB (00.94%) ++ cached/window(http://localhost:8056/tp5/digg.com/dads.new.digg.com/view.html@kw=zone%253A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%253Apermalink&template=5.html)
> │ │ │ └──0.00 MB (00.00%) ++ active/window(http://localhost:8056/tp5/digg.com/dads.new.digg.com
...snip...
So that's virtually all under |window-objects/top/cached/window|. I get the feeling some caching behavior changed for the worse. There also seems to be a regression in images and heap-unclassified, but that could just be cache related or random noise.
Comment 11•10 years ago
|
||
24th Bug 1101193 a good time to watch.
Comment 12•10 years ago
|
||
Kyle, I heard you might have insight into what's going on w/ caching here.
Flags: needinfo?(khuey)
That's the bfcache. Are we evicting it properly in the content process?
Flags: needinfo?(khuey)
Comment 14•10 years ago
|
||
This is with e10s disabled.
So this doesn't have anything to do with e10s?
Comment 16•10 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #15)
> So this doesn't have anything to do with e10s?
The current thinking is no, AFAICT the issue just slipped in while AWSY was busted due to e10s being enabled. I'll clear the e10s stuff for now.
tracking-e10s:
? → ---
Summary: e10s: ~60M AWSY regression on November 7 → ~40M AWSY regression on November 7
Comment 17•10 years ago
|
||
Regression range is: Thu, 06 Nov 2014 18:54:00 GMT -> Fri, 07 Nov 2014 03:08:20 GMT
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=2033324d4571&tochange=e60e90aa209d
AWSY is currently backfilling for that date range.
Comment 18•10 years ago
|
||
:smaug we think this regression is bfcache related, can you help in tracking it down? See comment 10 and comment 13.
Flags: needinfo?(bugs)
Comment 19•10 years ago
|
||
I don't see anything obviously DOM-related in there. Bug 1084136 is in there, but I only suspect that because of how tricky imagelib is. ;)
Comment 20•10 years ago
|
||
The only session history related changes recently landed 2014-10-15 (bug 855443) and
2014-11-14 (bug 1090918).
What all did mozmill 2.0 change? Comment 10 is talking about 11/6 result which had mozmill 2.0, right?
Flags: needinfo?(bugs)
Comment 21•10 years ago
|
||
In that regression range bug 1084136 sounds like the possible candidate causing memory usage changes.
Comment 22•10 years ago
|
||
seth, would it be possible that bug 1084136 caused changes to memory usage patterns?
Flags: needinfo?(seth)
Assignee | ||
Comment 23•10 years ago
|
||
I'm not really sure how it would. It should have *reduced* memory usage, since it eliminated a bunch of state that we used to keep track of for every image.
Bugs are always possible, though.
Flags: needinfo?(seth)
Comment 24•10 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #20)
> The only session history related changes recently landed 2014-10-15 (bug
> 855443) and
> 2014-11-14 (bug 1090918).
>
> What all did mozmill 2.0 change? Comment 10 is talking about 11/6 result
> which had mozmill 2.0, right?
That was a diff of runs that used 1.5 (albeit w/ 10 days in between).
Comment 25•10 years ago
|
||
Range has been reduced: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=661c58109d16&tochange=e60e90aa209d
We can definitely rule out e10s at this point.
Comment 26•10 years ago
|
||
And it looks like bug 1084136 is the offender: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f129b17f9067&tochange=f245578c4fa4
I'm pretty sure the repro steps are:
#1 - open tab to site A
#2 - navigate to site B in the same tab
(maybe repeat)
#3 - open about:memore, measure
#4 - note the |window-objects/top/cached/window| entry
Depends on: 1084136
Flags: needinfo?(seth)
Summary: ~40M AWSY regression on November 7 → ~40M AWSY regression after bug 1084136
Comment 27•10 years ago
|
||
[Tracking Requested - why for this release]: Increased memory usage due to holding alive pages for longer.
Blocks: 1084136
status-firefox35:
--- → unaffected
status-firefox36:
--- → affected
status-firefox37:
--- → affected
tracking-firefox36:
--- → ?
tracking-firefox37:
--- → ?
No longer depends on: 1084136
Keywords: regression
Updated•10 years ago
|
Component: Untriaged → ImageLib
Comment 28•10 years ago
|
||
It would appear that bug 1089046 improves the situation for "Tabs Closed" case. We saw a 40MB drop in explicit when that landed on 11/15.
The following are all still regressed:
Explicit: After TP5
Explicit: After TP5 [+30s]
Explicit: After TP5 [+30s, forced GC]
Updated•10 years ago
|
Comment 29•10 years ago
|
||
Seth, any idea what accounts for the remainder? Presumably something related to the fix you already landed...
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment 30•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #29)
> Seth, any idea what accounts for the remainder? Presumably something related
> to the fix you already landed...
The current dev cycle finishes in a couple of days. This regression is bad enough that we need to consider backing it out if it doesn't get fixed ASAP.
Assignee | ||
Comment 31•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #29)
> Seth, any idea what accounts for the remainder? Presumably something related
> to the fix you already landed...
No, I don't. There's no obvious explanation as to why the code in bug 1084136 *or* bug 1089046 should affect whether we leak or don't leak window objects. Those bugs really only affect the notifications we send from ImageLib. It's likely that the notification changes are tickling a bug in some notification consumer.
I'll go through the different patches in that bug to try to narrow down when the problem was introduced. That should make it much easier to pinpoint the problem.
Flags: needinfo?(seth)
Assignee | ||
Comment 32•10 years ago
|
||
So following these steps:
(1) Minimize memory usage in an about:memory tab
(2) Load reddit.com in tab A
(3) Load reddpics.com in tab A
(4) Measure memory usage in the about:memory tab
I looked at a set of revisions and observed the cached/window entries:
changeset: 214508:30a89ed1abf3. 6.09 MB (02.88%) ++ cached/window(http://www.reddit.com/)
changeset: 214518:a28e572e9919. 6.00 MB (03.17%) ++ cached/window(http://www.reddit.com/)
changeset: 214528:f245578c4fa4. 5.84 MB (03.02%) ++ cached/window(http://www.reddit.com/)
214508 is from well before bug 1084136. 214518 is the first patch in bug 1084136, and 214528 is the last.
My conclusion is that we still don't have STR that demonstrate the problem locally. =(
Assignee | ||
Comment 33•10 years ago
|
||
Eric, Nicholas, what's the best path forward here? As I mentioned in the previous comment, right now I don't know how to reproduce this locally. I also realized that I don't have access to the VPN now (something's wrong with my config; can't figure it out) and I probably won't until sometime tomorrow afternoon.
If you see this before you get access to the VPN, I'd really appreciate it if you could trigger the first couple of levels of a binary search between revision 214518 and revision 214528. If we can figure out which part caused the regression, it will be a lot easier to fix it.
Flags: needinfo?(n.nethercote)
Flags: needinfo?(erahm)
Assignee | ||
Comment 34•10 years ago
|
||
Update: Eric has tried running AWSY locally with my patches and observed a ~23MB *reduction* in memory usage. So this thing must be timing dependent.
Eric was kind enough to try to set up the bisect on AWSY, but apparently there's some issue with AWSY keeping it from running against specific commits.
So we seem to be stuck, with no way to dig further into this.
Flags: needinfo?(n.nethercote)
Flags: needinfo?(erahm)
Assignee | ||
Comment 35•10 years ago
|
||
So johns let me know that we can still do this AWSY bisection via try builds. I'm going to push a sequence of try jobs now that will do the first couple of binary search steps. (Due to time limitations I think we're going to have to try some possibilities in parallel.)
For rev 214520: https://tbpl.mozilla.org/?tree=Try&rev=58a902dff876
For rev 214523: https://tbpl.mozilla.org/?tree=Try&rev=ae994a7f4675
For rev 214526: https://tbpl.mozilla.org/?tree=Try&rev=b2667ebc9625
When these are done, I'll trigger AWSY runs for them. Based on those results, we'll need to push two more try jobs/AWSY runs to narrow it down to the offending revision.
Assignee | ||
Comment 36•10 years ago
|
||
Alright, so Eric advised that I should go ahead and just push everything now. Here's the full list, including the ones from the previous comment:
For rev 214518: https://tbpl.mozilla.org/?tree=Try&rev=8d8dc905664a
For rev 214519: https://tbpl.mozilla.org/?tree=Try&rev=c15f5e5160db
For rev 214520: https://tbpl.mozilla.org/?tree=Try&rev=58a902dff876
For rev 214521: https://tbpl.mozilla.org/?tree=Try&rev=f22fc3822456
For rev 214522: https://tbpl.mozilla.org/?tree=Try&rev=57e90f81bc6f
For rev 214523: https://tbpl.mozilla.org/?tree=Try&rev=ae994a7f4675
For rev 214524: https://tbpl.mozilla.org/?tree=Try&rev=e6d96d1c9851
For rev 214525: https://tbpl.mozilla.org/?tree=Try&rev=c74dbfe3f4a2
For rev 214526: https://tbpl.mozilla.org/?tree=Try&rev=b2667ebc9625
For rev 214527: https://tbpl.mozilla.org/?tree=Try&rev=6b0385fd22a2
For rev 214528: https://tbpl.mozilla.org/?tree=Try&rev=09c747d78c63
Assignee | ||
Comment 37•10 years ago
|
||
All those try jobs finished building, so I queued them up in an AWSY series. The results should appear here:
https://areweslimyet.com/?series=bug1100253
Assignee | ||
Comment 38•10 years ago
|
||
Right now it looks very much to me like the culprit is part 2, which dealt with onload blocking. That's rev 214519, which corresponds to this try job in the AWSY series: https://tbpl.mozilla.org/?tree=Try&rev=c15f5e5160db
(Note that the AWSY series doesn't list the parts in the correct order. Sigh.)
I will push an experimental patch to try to restore the notification order to how it was before part 2. Let's see how it affects the AWSY results.
Assignee | ||
Comment 39•10 years ago
|
||
OK, here's one such patch, which makes us now send UnblockOnload after all other notifications, as we did before.
https://tbpl.mozilla.org/?tree=Try&rev=1f96124c1a15
Comment 40•10 years ago
|
||
Looks like you're getting closer, so I'll cancel the needinfo request that was made of me.
Assignee | ||
Comment 41•10 years ago
|
||
Eric, could you please add the try job in comment 39 to AWSY series bug1100253? I won't have access to the Mozilla VPN for a few days.
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(erahm)
Comment 42•10 years ago
|
||
Tests are now running for the try build in comment 39.
Updated•10 years ago
|
Flags: needinfo?(erahm)
Assignee | ||
Comment 43•10 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #42)
> Tests are now running for the try build in comment 39.
Thanks, Eric. It looks like that change alone was not sufficient to fix the problem.
The three other possibilities I see at this point are:
(1) There is a bug in that patch that makes us somehow not send UnblockOnload after sending BlockOnload, which would certainly cause problems.
(2) There is a dependency in one of the observers on BlockOnload/UnblockOnload being sent *multiple times*, which is the thing that patch was trying to eliminate.
Case (1) would be most easily debugged by adding assertions. What happens if we assert during an AWSY? Is the output logged somewhere?
Case (2) really is a bug in whichever observer is causing the problem. We can narrow it down by modifying the patch to send BlockOnload/UnblockOnload multiple times, verifying that that fixes the problem, then adding code to ignore multiple BlockOnload/UnblockOnload calls in each observer individually until the problem reappears.
Assignee | ||
Comment 44•10 years ago
|
||
(Two other possibilities, apparently. =)
Comment 45•10 years ago
|
||
(In reply to Seth Fowler [:seth] from comment #43)
> Case (1) would be most easily debugged by adding assertions. What happens if
> we assert during an AWSY? Is the output logged somewhere?
I believe we have logs for each test run, just make sure you use release assertions.
Comment 46•10 years ago
|
||
So this regression is still around, what's the status?
Updated•10 years ago
|
Flags: needinfo?(seth)
Assignee | ||
Comment 47•10 years ago
|
||
Right, no progress has been made recently because I ended up having no time during Mozlandia and then I had to focus on more serious crashes and rendering regressions. My plan is to come back to this tomorrow.
Flags: needinfo?(seth)
Assignee | ||
Comment 48•10 years ago
|
||
Alright, sorry for the long delay on coming back to this. I've pushed a try job that will investigate case (2) above, since I think it's the more likely case. Will add this to the AWSY series once the try job completes.
https://tbpl.mozilla.org/?tree=Try&rev=57389a1f4c48
Assignee | ||
Comment 49•10 years ago
|
||
Actually, I'd like test results too, so I've pushed the same patch as in comment 48, but this time running tests.
https://tbpl.mozilla.org/?tree=Try&rev=0101f616c516
Assignee | ||
Comment 50•10 years ago
|
||
The patch in comment 48 appears, indeed, to fix the regression - it causes an improvement of ~38MB on AWSY. It also causes some oranges! That's OK, though, because the intention is not to use this patch directly. We now know that some observer wants the onload block/unblock notifications sent multiple times. The next step is to investigate and determine which observer is causing the problem.
Comment 51•10 years ago
|
||
(In reply to Seth Fowler [:seth] from comment #50)
> The patch in comment 48 appears, indeed, to fix the regression - it causes
> an improvement of ~38MB on AWSY.
That sounds much better! Let me know if I can help out.
Assignee | ||
Comment 52•10 years ago
|
||
So there are only three implementations of imgIOnloadBlocker that actually do any onload blocking. I've prepared a patch stack that stops each of them from propagating the onload blocking to the document if more than one onload block/unblock notification is received. These are built on top of the patch from comment 48. Most likely, one of these patches will restore the regression, and that will clarify which imgIOnloadBlocker implementation is buggy.
For nsImageBoxFrame (which is XUL, so I doubt this is the culprit):
https://tbpl.mozilla.org/?tree=Try&rev=3f27019f4d37
For ImageLoader (which handles images loaded through CSS):
https://tbpl.mozilla.org/?tree=Try&rev=f4bd878bcfac
For nsImageLoadingContent (which handles images loaded through JavaScript and the DOM):
https://tbpl.mozilla.org/?tree=Try&rev=fe486e71e340
Assignee | ||
Comment 53•10 years ago
|
||
So the big bump seems to come from ImageLoader (f4bd878bcfac).
I'm kinda not surprised, as compared to nsImageLoadingContent, ImageLoader is not very robust. =\
So the next step is to try to figure out what's going wrong in ImageLoader. My hypothesis at this point is that without the duplicate block/unblock onload notifications, ImageLoader ends up blocking onload but never unblocking it somehow, which indirectly causes the regression in memory usage.
Comment 54•10 years ago
|
||
Thanks for the diligent investigation, Seth. Unfortunately I'm getting nervous about this... it sounds like it's more complicated than expected, the holiday break is coming up, and I can see this regression hitting Beta without having been fixed. (Not saying it's certain, but possible.) Is there a Plan B?
Assignee | ||
Comment 55•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #54)
> Thanks for the diligent investigation, Seth. Unfortunately I'm getting
> nervous about this... it sounds like it's more complicated than expected,
> the holiday break is coming up, and I can see this regression hitting Beta
> without having been fixed. (Not saying it's certain, but possible.) Is there
> a Plan B?
It's exactly as complicated as expected from my perspective, because I knew the bug was not in ImageLib, but in one of the observers. I'm still pretty confident that we can get this fixed soon. It's only today that we've identified the root cause.
The timing of both Portlandia and the holiday break have unfortunately given us less usable time to fix this than in other release cycles, but (for me at least) there are still more than two weeks of working time until the next uplift.
Plan B would be to back out every patch that has landed in ImageLib in the last couple of months, pretty much. Not pretty, and it'd be preferable to avoid it, but it's an option. I don't think we need to have that discussion yet, though. If we did that, it should only be from 36 and not from 37.
Comment 56•10 years ago
|
||
Thanks for the update. I hope your optimism is well-founded :)
Assignee | ||
Comment 57•10 years ago
|
||
OK, so now that we've narrowed down the problem to ImageLoader, I took a look at the code and came up with two patches that might fix the problem.
One is a patch that tries to ensure that we always unblock onload after blocking it in ImageLoader:
https://tbpl.mozilla.org/?tree=Try&rev=fbf289da5419
A second ensures that we block all notifications when cloning an imgRequestProxy in ImageLoader (the existing code tries to do this, but fails badly):
https://tbpl.mozilla.org/?tree=Try&rev=102995acb7fb
Finally, here's a push with both patches at the same time:
https://tbpl.mozilla.org/?tree=Try&rev=bae1a82fc1b6
Once these pushes build I'll run them through AWSY; hopefully one of them fixes the problem.
Assignee | ||
Comment 58•10 years ago
|
||
Hmm.. I'm unclear why TBPL acts as if it hasn't seen the hashes for the patches from comment 52 before. Just to make sure nothing funny's going on, I pushed just the patches from comment 52 as a baseline, as well:
https://tbpl.mozilla.org/?tree=Try&rev=a2d88c1d056f
Assignee | ||
Comment 59•10 years ago
|
||
Hmm... I think something may have gone wrong with my last set of pushes. Let me try again.
One is a patch that tries to ensure that we always unblock onload after blocking it in ImageLoader:
https://tbpl.mozilla.org/?tree=Try&rev=4f7ee42cc7c9
A second ensures that we block all notifications when cloning an imgRequestProxy in ImageLoader (the existing code tries to do this, but fails badly):
https://tbpl.mozilla.org/?tree=Try&rev=1c9bfcefb452
Finally, here's a push with both patches at the same time:
https://tbpl.mozilla.org/?tree=Try&rev=ff90be1ba459
Baseline:
https://tbpl.mozilla.org/?tree=Try&rev=d89482846552
We've got one more week before uplift.
Updated•10 years ago
|
Assignee: nobody → seth
Assignee | ||
Comment 61•10 years ago
|
||
The patch in this try job is believed to resolve the problem, looks good on try, and is based on Aurora tip:
https://tbpl.mozilla.org/?tree=Try&rev=6e7d4c06b47
Unfortunately, AWSY is not accepting try job submissions right now, so until that's fixed I cannot verify that the issue is resolved.
Comment 62•10 years ago
|
||
(In reply to Seth Fowler [:seth] from comment #61)
> The patch in this try job is believed to resolve the problem, looks good on
> try, and is based on Aurora tip:
>
> https://tbpl.mozilla.org/?tree=Try&rev=6e7d4c06b47
>
> Unfortunately, AWSY is not accepting try job submissions right now, so until
> that's fixed I cannot verify that the issue is resolved.
It would appear that a 'c' was left off the end of that revision, I've queued up a run for 6e7d4c06b47c.
Assignee | ||
Comment 63•10 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #62)
> It would appear that a 'c' was left off the end of that revision, I've
> queued up a run for 6e7d4c06b47c.
Thanks for noticing that problem and taking care of it, Eric. It's interesting that the 'c' is also missing from the TBPL link, but the link still works - I didn't realize that could happen!
Based upon the results, I've gone ahead and pushed that patch to Aurora. (The work is taking is place in bug 1120149.) It looks like it completely resolves the regression once tabs are closed. I'm not sure whether there's still a regression when tabs are open or whether it's just an artifact of the fact that we now wait 60 seconds before discarding surfaces, instead of (IIRC) 15 seconds as in the past.
Comment 64•10 years ago
|
||
(In reply to Seth Fowler [:seth] from comment #63)
> Based upon the results, I've gone ahead and pushed that patch to Aurora.
> (The work is taking is place in bug 1120149.) It looks like it completely
> resolves the regression once tabs are closed. I'm not sure whether there's
> still a regression when tabs are open or whether it's just an artifact of
> the fact that we now wait 60 seconds before discarding surfaces, instead of
> (IIRC) 15 seconds as in the past.
Are we going to land the hack on m-c as well? If not I don't think we'll see a change in memory usage on AWSY so the regression will remain until we get a proper fix in.
I filed bug 1120607 to follow up on the possible regression from changing the amount of time before discarding surfaces.
Flags: needinfo?(seth)
Assignee | ||
Comment 65•10 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #64)
> Are we going to land the hack on m-c as well? If not I don't think we'll see
> a change in memory usage on AWSY so the regression will remain until we get
> a proper fix in.
Yes, but it needs to be rewritten again. Unfortunately that code has been in a constant state of flux lately.
However, if I get it done soon, we should be able to use the same patch for Aurora and trunk, as they obviously haven't diverged much yet.
I can't get to it immediately, as I have other fires to put out related to the merge, but I'll aim to get it done sometime next week.
Flags: needinfo?(seth)
Comment 66•10 years ago
|
||
I added annotations to AWSY for this regression.
Whiteboard: [MemShrink:P1] → [MemShrink:P1][awsy+]
Assignee | ||
Comment 67•10 years ago
|
||
So bug 1120149 has landed on trunk, and I'm requesting uplift for Aurora. This should be fixed on every affected branch.
Can we resolve this now?
Status: NEW → UNCONFIRMED
Ever confirmed: false
Assignee | ||
Updated•10 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 68•10 years ago
|
||
(In reply to Seth Fowler [:seth] from comment #67)
> So bug 1120149 has landed on trunk, and I'm requesting uplift for Aurora.
> This should be fixed on every affected branch.
>
> Can we resolve this now?
AWSY confirms, the fix looks good. I've added an annotation for the fix.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [MemShrink:P1][awsy+] → [MemShrink:P1][awsy+][awsy-]
Comment 69•10 years ago
|
||
Marking it has fixed thanks to the patch in bug 1120149
Comment 70•10 years ago
|
||
Marking 37 as fixed as well as bug 1120149 landed on 37.
You need to log in
before you can comment on or make changes to this bug.
Description
•