Closed Bug 1181907 Opened 5 years ago Closed 5 years ago

FF41 becomes very slow after 1-2 hours of use

Categories

(Core :: CSS Parsing and Computation, defect, major)

41 Branch
defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
firefox41 + wontfix
firefox42 + unaffected
firefox43 --- fixed

People

(Reporter: jmvalin, Assigned: seth)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression)

Attachments

(5 files, 6 obsolete files)

146.78 KB, application/x-targz
Details
1.49 KB, patch
Details | Diff | Splinter Review
1.66 KB, patch
Details | Diff | Splinter Review
15.99 KB, patch
Details | Diff | Splinter Review
111.84 KB, image/jpeg
Details
After upgrading to FF41 (developer edition), I noticed that Firefox is becoming very slow after about 1-2 hours of use. If I restart it, it becomes much faster, even as I reload all tabs I had open. Then after some time, it becomes slow again and I have to restart. Even the GUI itself is slow. For example, if I click on the "Help" menu, it takes close to 1 second before the menu actually shows up. Then as I move the mouse down, items in the menu take close to one second to highlight when I move the mouse over them. I tried both with and without e10s and the same happens. I also tried replacing ABP with ublock and I still get the same behaviour (I disabled all other addons I had except for the profiler). I have extracted some profiler data:
http://people.mozilla.org/~bgirard/cleopatra/#report=0b82534f7e7935544fa20bd5f749aabfa08891c8
http://people.mozilla.org/~bgirard/cleopatra/#report=8e414bc2e21ee54224c88a42ed8b63df23dabdae
http://people.mozilla.org/~bgirard/cleopatra/#report=6bb461873fa7ddbcd7ed2f30fe51a0b7bc86734d
http://people.mozilla.org/~bgirard/cleopatra/#report=d56cdb78f70d58f11763d08b2746230c7f21bfca

Prior to FF41, I sometimes had to restart FF because it was becoming slow, but it was once every few days, not once every few hours. I'm running it on Fedora 22 on a quad-core Lenovo W540 laptop with 16 GB RAM (not exactly a slow machine). I'm not running out of memory and I don't have swap enabled, so it's not a swapping issue.
So I decided to downgrade to FF40 and it's much better. I still feel like it's slowing down a bit over time, but after 24 hours, the "response time" is maybe around 0.2 seconds. It's still a bit annoying, but nowhere near as bad as the 1 second lag I'm getting after just 1-2 hours of use with FF41.
There is a single element that takes an extremely long time to restyle. Unfortunately the profiler doesn't show which element it is (it's a ?). The profile seems to indicate it's not related to drivers or the OS.

I have an identical machine with Fedora 22 and don't run into this problem at all, FWIW.
Also, I tried creating a new profile and it didn't help. Another thing I noticed is that before it becomes noticeably slow, I can see the amount of CPU used by the GUI go up. For example, when I just started Firefox, moving my mouse between menu items (i.e. making them highlight one after the other) takes less than 10% CPU. Over time that amount of CPU slowly goes up, until it takes 100% CPU to just handle the menu item highlighting.
So on Thomas' suggestion, I tried nightly (42) and it's also affected.
about:support data:
https://pastebin.mozilla.org/8841461
Attached file memory-report.json.gz
about:memory
09:48 < jmspeex> Hi, I've been told I might be able to get help here about fixing this bug I filed: 
                 https://bugzilla.mozilla.org/show_bug.cgi?id=1181907
09:48 < firebot> Bug 1181907 — NEW, nobody@mozilla.org — FF41 becomes very slow after 1-2 hours of use
09:49 < jmspeex> I'm a Mozilla employee, but I'm working on the Daala codec, i.e. quite far from FF
09:49 < jmspeex> Anything else I can measure before I restart Firefox again?
09:50 < Gijs> jmspeex: looking.
09:51 < ttaubert> jmspeex: interesting... looks like you spend a lot of time in 
                  nsTArray_Impl<mozilla::WeakPtr<mozilla::image::IProgressObserver>, 
                  nsTArrayInfallibleAllocator>::IndexOf....
09:52 < ttaubert> do we add image progress observers and don't remove them maybe?
09:52 < jmspeex> how would I know?
09:53 < jmspeex> BTW, I believe two of the profiles were just me moving the mouse cursor in the "Help" menu, 
                 making the different items highlight under the mouse
09:53 < jmspeex> (with a lag of about 0.5 to 1 second)
09:54 -!- mixedpuppy [mixedpuppy@moz-igukuv.vc.shawcable.net] has quit [Client exited]
09:54 < ttaubert> jmspeex: on the right there's a list "Most time spent in"
09:54 < ttaubert> and it's at the top
09:55 < Gijs> jmspeex: needinfo :seth on the bug with the info that ttaubert just pointed out
09:56 < jmspeex> I don't understand the info you need
09:56  * Gijs was trying to look at the callsite but can't easily find it (probably some inline thing that got 
          optimized away or whatever)
09:56 < Gijs> jmspeex: I mean you can set the "needinfo" flag on the bug
09:56 < jmspeex> Keep in mind I don't know anything about FF development
09:56 < Gijs> you need a person to request needinfo from
09:56 < Gijs> that person is ":seth"
09:56 < Gijs> (mind the colon; that'll autocomplete/find his email address)
Flags: needinfo?(seth)
(Seth, I realize this is a little rough still, but at least 2 of the profiles point to that ::IndexOf call, and I am hoping you have ideas on how Jean-Marc can narrow down what's causing the issue here.)
Keywords: regression
It looks like this might be the culprit with a huge list of observers:

void
ProgressTracker::AddObserver(IProgressObserver* aObserver)
{
  MOZ_ASSERT(NS_IsMainThread());
  mObservers.AppendElementUnlessExists(aObserver);
}

AppendElementUnlessExists() ends up calling IndexOf() to determine whether the element already exists.
Printing mObservers.Length() in there shows that indeed the list seems to be constantly growing.

1) Add a printf()
2) Open https://people.mozilla.org/~tvyas/mixedboth.html
3) Hit Ctrl+T, then Ctrl+W, and repeat to watch the length grow

I'm now at a 1000 observers after doing that a few dozen times.
Although we probably should call .RemoveElement() from somewhere it might not hurt to automatically remove observers when we iterate to notify and we hit an invalid weak ref? Not sure how those arrays behave if mutated while iterating. nsObserverList deals with it that way at least.
Another weird thing I observed... When FF starts becoming slow, I close tabs and there's an exact number of opened tabs that makes it fast again. For example, on one run, having 11 tabs open was showing 100% CPU use by just moving the mouse in the menu and that CPU would go down to 10% if I closed one tab. Then opening a new tab (even if empty) would make FF slow again, and closing it would make it fast again. I've verified that the number is not constant across runs. I restarted FF and when it became slow again, I tried the same and this time the threshold was between 9 and 10 open tabs instead of between 10 and 11 tabs.

Since tabs appear to be somehow involved, I also tried disabling Custom Tab Width, but it didn't help.
Opening and closing the main menu (the hamburger button in the top right) makes the progress observer count go up very fast. I'm at 50k+ observers right now but can't quite reproduce the slowness that Jean-Marc is seeing.
Well, for me opening and closing the main menu isn't so bad. Try setting FF to show the menu bar. Then click on a menu and move the mouse up and down to make the different items highlight. That's one of the most obvious symptoms for me (though everything else is slow too). Also note that the slow behaviour isn't present on startup. It usually takes 1-2 hours to show up.
(In reply to Tim Taubert [:ttaubert] from comment #11)
> Although we probably should call .RemoveElement() from somewhere it might
> not hurt to automatically remove observers when we iterate to notify and we
> hit an invalid weak ref? Not sure how those arrays behave if mutated while
> iterating. nsObserverList deals with it that way at least.

Looks like all the observers are alive?
Tim, isn't this the same bug we were talking about a while ago? I remember you were investigating it; what did you learn?
Flags: needinfo?(seth) → needinfo?(tnikkel)
I was investigating a specific bug I was seeing with lots of observers on the image for the tab separator. The investigation lead me to try to understand why the style system was creating so many different CSS ImageValue objects for the same image. That's as far as I got before I had to move onto to something else.
Flags: needinfo?(tnikkel)
Assuming that the testcase from bug 1129368 didn't come out of nowhere (ie it come from something a web dev was trying to do) then there are legitimate reasons to have a large number of observers. So we probably then want to convert the observers array to a hash table. But we also don't want to be creating thousands of CSS ImageValue objects for no reason, so the specific cause of the observers should be investigated to see if it needs to be fixed or not too.
Another random observation... The slowdown was happening again and I decided to open a new window (Ctrl-N). Interestingly, the new window was not suffering from the slowdown at all. Then I decided to move the tabs from the original window to the new window one by one, after which the original window got closed (no more tabs). At that point, the new window was still fast.

Two things that also happened while doing that (but I've no idea if it was related):
1) Many of the tabs I would move to the new window did not redraw properly, instead showing a spinning wheel. I had to reload "ctrl-R" them. 
2) A few minutes after I was done moving all tabs to the new window, all of the tabs crashed (I have e10s enabled) and I was asked to restart them. Everything's been fine after restarting them.
I have now started hitting this bug too, sadly. My traces look very similar, though they are much deeper due to e10s. If it's helpful, I can get some cleaner traces as mine do look slightly different.
Depends on: 1194912
Bug 1194912 should greatly reduce the performance impact of having tons of observers. But it'd still be nice to understand why this is happening. The observer "leak" is clearly coming from chrome. A small sample from some logging I did:

*** 0x11fbfeec0 [chrome://browser/skin/tabbrowser/tab-separator.png] Observer count now at 313
*** 0x1388d6640 [chrome://browser/skin/menuPanel-small.png] Observer count now at 176
*** 0x14f48a380 [chrome://browser/skin/urlbar-history-dropmarker.png] Observer count now at 62

These numbers all look *way* too high to me.
I'm focusing on tab-separator.png. This is the stack that leads us to repeatedly clone the imgRequestProxy for this image:

#01: mozilla::css::ImageLoader::LoadImage(nsIURI*, nsIPrincipal*, nsIURI*, mozilla::css::ImageValue*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x404a6f6]
#02: mozilla::css::ImageValue::ImageValue(nsIURI*, nsStringBuffer*, nsIURI*, nsIPrincipal*, nsIDocument*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4106c48]
#03: mozilla::css::ImageValue::ImageValue(nsIURI*, nsStringBuffer*, nsIURI*, nsIPrincipal*, nsIDocument*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x40ffefd]
#04: nsCSSValue::StartImageLoad(nsIDocument*) const[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x40ffd49]
#05: TryToStartImageLoadOnValue(nsCSSValue const&, nsIDocument*, nsCSSValueTokenStream*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x40d6576]
#06: TryToStartImageLoad(nsCSSValue const&, nsIDocument*, nsCSSProperty, nsCSSValueTokenStream*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x40d6479]
#07: TryToStartImageLoad(nsCSSValue const&, nsIDocument*, nsCSSProperty, nsCSSValueTokenStream*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x40d63fa]
#08: MapSinglePropertyInto(nsCSSProperty, nsCSSValue const*, nsCSSValue*, nsRuleData*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x408e696]
#09: nsCSSExpandedDataBlock::MapRuleInfoInto(nsCSSProperty, nsRuleData*) const[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4090364]
#10: (anonymous namespace)::CSSParserImpl::ParsePropertyWithVariableReferences(nsCSSProperty, nsCSSProperty, nsAString_internal const&, mozilla::CSSVariableValues const*, nsRuleData*, nsIURI*, nsIURI*, nsIPrincipal*, mozilla::CSSStyleSheet*, unsigned int, unsi[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x409490d]
#11: nsCSSParser::ParsePropertyWithVariableReferences(nsCSSProperty, nsCSSProperty, nsAString_internal const&, mozilla::CSSVariableValues const*, nsRuleData*, nsIURI*, nsIURI*, nsIPrincipal*, mozilla::CSSStyleSheet*, unsigned int, unsigned int)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4094380]
#12: nsRuleNode::ResolveVariableReferences(nsStyleStructID, nsRuleData*, nsStyleContext*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x413f823]
#13: nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x413fc46]
#14: nsStyleBackground const* nsRuleNode::GetStyleBackground<true>(nsStyleContext*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3dd25d2]
#15: nsStyleBackground const* nsStyleContext::DoGetStyleBackground<true>()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3dd24cc]
#16: nsStyleContext::StyleBackground()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3dd0e25]
#17: nsStyleContext::CalcStyleDifference(nsStyleContext*, nsChangeHint, unsigned int*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x416e1b8]
#18: mozilla::ElementRestyler::CaptureChange(nsStyleContext*, nsStyleContext*, nsChangeHint, unsigned int*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x42000a6]
#19: mozilla::ElementRestyler::RestyleSelf(nsIFrame*, nsRestyleHint, unsigned int*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4204233]
#20: mozilla::ElementRestyler::Restyle(nsRestyleHint)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4201cf4]
#21: mozilla::ElementRestyler::RestyleContentChildren(nsIFrame*, nsRestyleHint)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4207e2e]
#22: mozilla::ElementRestyler::RestyleChildren(nsRestyleHint)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x420613a]
#23: mozilla::ElementRestyler::Restyle(nsRestyleHint)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4202563]
#24: mozilla::ElementRestyler::ComputeStyleChangeFor(nsIFrame*, nsStyleChangeList*, nsChangeHint, mozilla::RestyleTracker&, nsRestyleHint, mozilla::RestyleHintData const&, nsTArray<mozilla::ElementRestyler::ContextToClear>&, nsTArray<nsRefPtr<nsStyleContext> >[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4208bc1]
#25: mozilla::RestyleManager::ComputeAndProcessStyleChange(nsIFrame*, nsChangeHint, mozilla::RestyleTracker&, nsRestyleHint, mozilla::RestyleHintData const&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41fbb7d]
#26: mozilla::RestyleManager::RestyleElement(mozilla::dom::Element*, nsIFrame*, nsChangeHint, mozilla::RestyleTracker&, nsRestyleHint, mozilla::RestyleHintData const&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41fb731]
#27: mozilla::RestyleTracker::ProcessOneRestyle(mozilla::dom::Element*, nsRestyleHint, nsChangeHint, mozilla::RestyleHintData const&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4288253]
#28: mozilla::RestyleTracker::DoProcessRestyles()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x422b71a]
#29: mozilla::RestyleManager::ProcessRestyles(mozilla::RestyleTracker&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4213bd2]
#30: mozilla::RestyleManager::ProcessPendingRestyles()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41fdf56]
#31: PresShell::FlushPendingNotifications(mozilla::ChangesToFlush)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4303ff6]
#32: PresShell::FlushPendingNotifications(mozFlushType)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4302b8d]
#33: nsDocument::FlushPendingNotifications(mozFlushType)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x1c9bbad]
#34: mozilla::dom::Element::GetPrimaryFrame(mozFlushType)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x1b381a4]
#35: mozilla::dom::Element::GetScrollFrame(nsIFrame**, bool)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x1b38235]
#36: mozilla::dom::Element::ScrollLeft()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x1b38b8e]
#37: mozilla::dom::ElementBinding::get_scrollLeft(JSContext*, JS::Handle<JSObject*>, mozilla::dom::Element*, JSJitGetterCallArgs)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2b09851]
#38: mozilla::dom::GenericBindingGetter(JSContext*, unsigned int, JS::Value*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2d4270a]
#39: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x627c8e8]
#40: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x620044d]
#41: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x61e356d]
#42: js::InvokeGetter(JSContext*, JSObject*, JS::Value, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6228de0]
#43: CallGetter(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<js::Shape*>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6241366]
#44: bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JSObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::Shape*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x623d487]
#45: bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JSObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<jsid, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRoot[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x623d7ff]
#46: js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x623d5ea]
#47: js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6270c16]
#48: js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, js::PropertyName*, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x626ddf1]
#49: js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x622b360]
#50: GetPropertyOperation(JSContext*, js::InterpreterFrame*, JS::Handle<JSScript*>, unsigned char*, JS::MutableHandle<JS::Value>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x624a174]
#51: Interpret(JSContext*, js::RunState&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6218e94]
#52: js::RunScript(JSContext*, js::RunState&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x620e014]
#53: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6200525]
#54: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x61e356d]
#55: js::InvokeGetter(JSContext*, JSObject*, JS::Value, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6228de0]
#56: CallGetter(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<js::Shape*>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6241366]
#57: bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JSObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::Shape*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x623d487]
#58: bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JSObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<jsid, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRoot[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x623d7ff]
#59: js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x623d5ea]
#60: js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6270c16]
#61: js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, js::PropertyName*, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x626ddf1]
#62: js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x622b360]
#63: GetPropertyOperation(JSContext*, js::InterpreterFrame*, JS::Handle<JSScript*>, unsigned char*, JS::MutableHandle<JS::Value>, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x624a174]
#64: Interpret(JSContext*, js::RunState&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6218e94]
#65: js::RunScript(JSContext*, js::RunState&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x620e014]
#66: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x6200525]
#67: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x61e356d]
#68: JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x691d974]
#69: mozilla::dom::EventHandlerNonNull::Call(JSContext*, JS::Handle<JS::Value>, mozilla::dom::Event&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2ad3d8f]
#70: void mozilla::dom::EventHandlerNonNull::Call<nsISupports*>(nsISupports* const&, mozilla::dom::Event&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&, char const*, mozilla::dom::CallbackObject::ExceptionHandling, JSCompartment*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fe2990]
#71: mozilla::JSEventHandler::HandleEvent(nsIDOMEvent*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fd513e]
#72: nsXBLPrototypeHandler::ExecuteHandler(mozilla::dom::EventTarget*, nsIDOMEvent*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3bbcca7]
#73: nsXBLEventHandler::HandleEvent(nsIDOMEvent*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3bbbd63]
#74: mozilla::EventListenerManager::HandleEventSubType(mozilla::EventListenerManager::Listener*, nsIDOMEvent*, mozilla::dom::EventTarget*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fa7fbc]
#75: mozilla::EventListenerManager::HandleEventInternal(nsPresContext*, mozilla::WidgetEvent*, nsIDOMEvent**, mozilla::dom::EventTarget*, nsEventStatus*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fa8726]
#76: mozilla::EventListenerManager::HandleEvent(nsPresContext*, mozilla::WidgetEvent*, nsIDOMEvent**, mozilla::dom::EventTarget*, nsEventStatus*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fc2e09]
#77: mozilla::EventTargetChainItem::HandleEvent(mozilla::EventChainPostVisitor&, mozilla::ELMCreationDetector&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fad211]
#78: mozilla::EventTargetChainItem::HandleEventTargetChain(nsTArray<mozilla::EventTargetChainItem>&, mozilla::EventChainPostVisitor&, mozilla::EventDispatchingCallback*, mozilla::ELMCreationDetector&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fa1fba]
#79: mozilla::EventDispatcher::Dispatch(nsISupports*, nsPresContext*, mozilla::WidgetEvent*, nsIDOMEvent*, nsEventStatus*, mozilla::EventDispatchingCallback*, nsTArray<mozilla::dom::EventTarget*>*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2fa3640]
#80: mozilla::DelayedEventDispatcher<mozilla::TransitionEventInfo>::DispatchEvents(nsPresContext* const&)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d5505]
#81: nsTransitionManager::DispatchEvents()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d5410]
#82: DispatchAnimationEventsOnSubDocuments(nsIDocument*, void*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41cb4dc]
#83: nsRefreshDriver::DispatchAnimationEvents()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41cb38f]
#84: nsRefreshDriver::Tick(long long, mozilla::TimeStamp)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41ca3c0]
#85: mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver*, long long, mozilla::TimeStamp)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d39d6]
#86: mozilla::RefreshDriverTimer::Tick(long long, mozilla::TimeStamp)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d38c5]
#87: mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(mozilla::TimeStamp)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d377d]
#88: mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::TimeStamp)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d36f0]
#89: void nsRunnableMethodArguments<mozilla::TimeStamp>::apply<mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver, void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp)>(mozilla::VsyncRefreshDriverTimer::RefreshDr[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d4164]
#90: nsRunnableMethodImpl<void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp), true, mozilla::TimeStamp>::Run()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41d3f1c]
#91: nsThread::ProcessNextEvent(bool, bool*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x19ae60]
#92: NS_ProcessPendingEvents(nsIThread*, unsigned int)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x219ca7]
#93: nsBaseAppShell::NativeEventCallback()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3dbc786]
#94: nsAppShell::ProcessGeckoEvents(void*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3e3132d]
#95: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x80a01]
#96: __CFRunLoopDoSources0[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x72b8d]
#97: __CFRunLoopRun[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x721bf]
#98: CFRunLoopRunSpecific[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x71bd8]
#99: RunCurrentEventLoopInMode[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x3256f]
#100: ReceiveNextEventCommon[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x321ee]
#101: _BlockUntilNextEventMatchingListInModeWithFilter[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x3212b]
#102: _DPSNextEvent[/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x918ab]
#103: -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x90e58]
#104: -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:][/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3e2fe07]
#105: -[NSApplication run][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x86af3]
#106: nsAppShell::Run()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3e31d21]
#107: nsAppStartup::Run()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4e9e221]
#108: XREMain::XRE_mainRun()[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f662d9]
#109: XREMain::XRE_main(int, char**, nsXREAppData const*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f66bed]
#110: XRE_main[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f67032]
#111: do_main(int, char**, nsIFile*)[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/firefox +0x297b]
#112: main[/Users/mfowler/Code/Work/decodepool-scheduling/obj/dist/NightlyDebug.app/Contents/MacOS/firefox +0x1d5d]
I've confirmed that this bug is related to CSS variables. Replacing this line from browser/themes/shared/tabs.inc.css:

>  background-image: var(--tab-separator-image);

With this line:

>  background-image: url(chrome://browser/skin/tabbrowser/tab-separator.png);

Completely eliminates the problem.
(That's what --tab-separator-image evaluates to, by the way. I'm just doing the same substitution that the variable would.)
So the problem in the end is that the data structure that we use to keep
references to ImageValues associated with CSS variables is not keyed by
nsStyleContext, and hence we have no way to ever free any of the ImageValues.
This series of patches will fix that.

This is a preliminary part that modifies nsGenericHashKey<T> to use a free
mozilla::Hash() function instead of calling T::Hash(). The default
implementation of mozilla::Hash() just calls T::Hash(), so all existing code
will continue to work, but this generalizes nsGenericHashKey<T> to work with
types that cannot have methods or can't be modified. Critically, this makes it
work with enums, which we'll take advantage of in later parts.

The pattern here - a "default" template which can be overridden by adding a
template specialization - is reminiscent of STL functions like std::swap().
Attachment #8652120 - Flags: review?(cam)
Assignee: nobody → seth
Status: NEW → ASSIGNED
In part 2 we take advantage of part 1 to make nsCSSProperty hashable.
Attachment #8652121 - Flags: review?(cam)
This part is purely mechanical. In part 4, we'll need access to an
nsStyleContext in nsCSSExpandedDataBlock::MapRuleInfoInto(), so we need to
propagate it through the intervening functions. This patch does that, but
doesn't actually change anything functional.
Attachment #8652122 - Flags: review?(cam)
OK, this is the patch that really fixes the problem. Unfortunately it's a bit
bigger than the other patches!

To track the ImageValues associated with CSS variables, this patch adds
CSSVariableImageTable. Since it's a static, process-wide service, it's
implemented as a namespace containing free functions.

Conceptually, CSSVariableImageTable is a map:

  (nsStyleStruct, nsCSSProperty) -> nsTArray<ImageValue>

In practice, it's a hash table containing a hash table containing an array. This
makes it easier to manipulate the storage associated with each property of each
style struct individually, which is useful since in practice we update the
ImageValues for each property separately.

The procedure for updating the ImageValue array is that the caller calls
CSSVariableImageTable::ReplaceAll(), passing in a lambda that calls
CSSVariableImageTable::Add() as many times as necessary. ReplaceAll() is
responsible for intelligently managing the storage used for the ImageValues. In
particular, it tries to avoid freeing and reallocating the ImageValue array when
nothing has changed, by reusing the same ImageValue array storage. It
optimistically assumes that the number of ImageValues won't change, and it
compacts the array or frees it completely afterwards if that proves not to be
true.

We also need a way to free the ImageValue arrays associated with an
nsStyleStruct when the nsStyleStruct is destroyed. That's accomplished by
CSSVariableImageTable::RemoveAll(), which is called by nsStyleStruct's
destructor.

I fiddled around quite a bit with this patch to try to make it as clean as
possible at the callsite. It's hard to make the design cleaner without, for
example, requiring that we always allocate a new ImageValue array in
ReplaceAll(), even if Add() never gets called.
Attachment #8652124 - Flags: review?(cam)
Comment on attachment 8652120 [details] [diff] [review]
(Part 1) - Make it possible to use nsGenericHashKey with free functions

Review of attachment 8652120 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/glue/nsHashKeys.h
@@ +653,5 @@
>  
> +namespace mozilla {
> +
> +template <typename T> PLDHashNumber
> +Hash(const T& aValue)

Nit: per https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Methods this should be:

template<typename T>
PLDHashNumber
Hash(Const T& aValue)
Attachment #8652120 - Flags: review?(cam) → review+
Comment on attachment 8652121 [details] [diff] [review]
(Part 2) - Make nsCSSProperty hashable

Review of attachment 8652121 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/nsCSSProperty.h
@@ +67,5 @@
>  };
>  
> +namespace mozilla {
> +
> +template<> inline PLDHashNumber

"inline PLDHashNumber" on a new line.
Attachment #8652121 - Flags: review?(cam) → review+
Comment on attachment 8652122 [details] [diff] [review]
(Part 3) - Propagate a style context from nsRuleNode::ResolveVariableReferences to nsCSSExpandedDataBlock::MapRuleInfoInto

Review of attachment 8652122 [details] [diff] [review]:
-----------------------------------------------------------------

I think this isn't needed; I forgot that on nsRuleData we already have the style context pointer, so there's no need to pass it down separately.
Attachment #8652122 - Flags: review?(cam) → review-
Comment on attachment 8652124 [details] [diff] [review]
(Part 4) - Add CSSVariableImageTable and use it to store ImageValues generated by CSS variables

Review of attachment 8652124 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/CSSVariableImageTable.h
@@ +2,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +/**

Please add a one line comment to the top of this file that briefly describes what a CSSVariableImageTable is, like:

/* a table that maintains references to blah blah ... */

This summary appears on mxr when browsing a directory.  (See for example nsStyleContext.h's comment.)

@@ +36,5 @@
> +#include "nsStyleContext.h"
> +#include "nsTArray.h"
> +
> +namespace mozilla {
> +namespace css {

We are trying to avoid any more use of the mozilla::css namespace.  At some point in the future we'll move it all down into mozilla.  So let's just put CSSVariableImageHashtable in namespace mozilla.  (Particularly since it already has "CSS" in its name!)

@@ +119,5 @@
> +      }
> +    } else {
> +      // We still have a non-empty image array for this property. Compact the
> +      // storage it's using if possible.
> +      imageList->Compact();

Is this really important to do?  I'm not sure we'll often get into the situation where we grow the imageList for a given (style context, property) pair, remove many of its entries, then keep hanging on to it.

@@ +153,5 @@
> +  if (!imageList) {
> +    imageList = new detail::ImageValueArray();
> +    perPropertyImageTable->Put(aProp, imageList);
> +  }
> +  

Trailing white space.
Attachment #8652124 - Flags: review?(cam) → review+
Comment on attachment 8652124 [details] [diff] [review]
(Part 4) - Add CSSVariableImageTable and use it to store ImageValues generated by CSS variables

Review of attachment 8652124 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/nsCSSDataBlock.cpp
@@ +143,3 @@
>      if (ShouldStartImageLoads(aRuleData, aProp)) {
>          nsIDocument* doc = aRuleData->mPresContext->Document();
> +        TryToStartImageLoad(*aValue, doc, aContext, aProp);

Pass in aRuleData->mStyleContext instead of aContext, per the comment 32 comment.
Thanks for the reviews! I'll upload updated versions of these patches shortly.

(In reply to Cameron McCormack (:heycam) (busy until Sep 1) from comment #33)
> @@ +119,5 @@
> > +      }
> > +    } else {
> > +      // We still have a non-empty image array for this property. Compact the
> > +      // storage it's using if possible.
> > +      imageList->Compact();
> 
> Is this really important to do?  I'm not sure we'll often get into the
> situation where we grow the imageList for a given (style context, property)
> pair, remove many of its entries, then keep hanging on to it.

I agree that this situation will probably rarely come up, but I figure that it's nice to be able to reclaim that storage, and Compact() boils down to two quick |if| checks if no compaction is needed. No compaction being needed should be the common case, since I suspect the *vast* majority of the time, the number of images in the list ends up being exactly the same.
Addressed review comments for part 1.
Attachment #8652120 - Attachment is obsolete: true
Addressed review comments for part 2.
Attachment #8652121 - Attachment is obsolete: true
Addressed review comments for part 3. (I did retain the call to Compact(), though.)
Attachment #8652122 - Attachment is obsolete: true
Attachment #8652124 - Attachment is obsolete: true
Updated the patch to only call CSSVariableImageTable::Add() in
TryToStartImageLoad() if we're operating on a eCSSUnit_TokenStream target. That
used to be implicit in whether we passed in a non-null style context, but now we
always have a context, so we need to check explicitly. This issue is responsible
for the orange in the try job above.
Attachment #8652662 - Attachment is obsolete: true
Try looks good. This updated version of part 3 just rewords a comment. Other
than that, we're ready to land whenever inbound reopens.
Attachment #8652935 - Attachment is obsolete: true
Blocks: 1199054
Does this mean I can just download nightly to check whether it indeed fixes my problem?
It'll be in tomorrow's Nightly.
(In reply to Jean-Marc Valin (:jmspeex) from comment #45)
> Does this mean I can just download nightly to check whether it indeed fixes
> my problem?

Please do check tomorrow's Nightly and report back! We'd really like to know if there's still a problem.
[Tracking Requested - why for this release]: This problem is pretty serious; Firefox can slow down after a couple of hours of use to the point where interactions with the UI can have more than a second of lag. It'd be ideal not to ship a Firefox release that contains this bug.
I've been running nightly for about 6 hours without restarting. No slowdown so far, but I'll keep testing to make sure the problem's really fixed.
Friday's nightly is still running without restart and without being slow, so I confirm this has indeed fixed the problem I've been seeing. Thanks.
Seth, I think we should uplift that to aurora asap to increase the testing coverage. After, we will be able to uplift that to beta with more confidence. Thanks
Flags: needinfo?(seth)
Instead of uplifting this we could also uplift a fix that removes css variables use from the tab seperator. That was a change that landed in june I think. Probably much less risky.
If you mean that there was a "fix" for this bug that landed in nightly in june, then I can definitely confirm it didn't fix the problem I was seeing since nightly from end of July still had the slowdown.
I meant that the original cause for this issue landed about June. (The cause being the use of css variables for the tab separator image.)
(In reply to Timothy Nikkel (:tn) from comment #55)
> I meant that the original cause for this issue landed about June. (The cause
> being the use of css variables for the tab separator image.)

Is the tab separator image the only user of CSS variables in browser.xul, though? I'm pretty sure I saw many others. This may have been an existing problem that just got exacerbated by the tab separator image change.

(Which isn't to say that your plan is a bad one, but I feel more confident that the problem is *truly* fixed if we actually uplift the patch.)
Flags: needinfo?(seth)
Looks like bug 1173730 was the culprit for tab-separator.png specifically.
Blocks: 1173730
Seth, do we need to uplift a fix to Beta41 and Aurora42? Thanks.
Flags: needinfo?(seth)
Comment on attachment 8652659 [details] [diff] [review]
(Part 1) - Make it possible to use nsGenericHashKey with free functions

Approval Request Comment
[Feature/regressing bug #]: Bug 1173730.
[User impact if declined]: The Firefox UI slows down drastically over time, leading to interface lag of over a second after a few hours.
[Describe test coverage new/current, TreeHerder]: On m-c since 8/27.
[Risks and why]: Medium risk. The changes only affect CSS variables, which are not commonly used, but they're fairly invasive changes.
[String/UUID change made/needed]: None.
Flags: needinfo?(seth)
Attachment #8652659 - Flags: approval-mozilla-aurora?
Comment on attachment 8652661 [details] [diff] [review]
(Part 2) - Make nsCSSProperty hashable

Approval Request Comment

Requesting uplift for part 2 as well.
Attachment #8652661 - Flags: approval-mozilla-aurora?
Comment on attachment 8653209 [details] [diff] [review]
(Part 3) - Add CSSVariableImageTable and use it to store ImageValues generated by CSS variables

Approval Request Comment

Requesting uplift for part 3 as well.
Attachment #8653209 - Flags: approval-mozilla-aurora?
(In reply to Seth Fowler [:seth] [:s2h] from comment #56)
> Is the tab separator image the only user of CSS variables in browser.xul,
> though? I'm pretty sure I saw many others. This may have been an existing
> problem that just got exacerbated by the tab separator image change.
> 
> (Which isn't to say that your plan is a bad one, but I feel more confident
> that the problem is *truly* fixed if we actually uplift the patch.)

Presumably the larger bug was introduced by bug 950436 in March 2014. No one noticed until the tab-separator used css variables for the image over a year later. I think the tab-separator was one of the first images in a css variable in our chrome, but I'm guessing it's not hard to capture them all safely. Just pointing out that there is an alternate approach for beta in case the risk was too much for this patch.
(In reply to Ritu Kothari (:ritu) from comment #58)
> Seth, do we need to uplift a fix to Beta41 and Aurora42? Thanks.

Yes. Per comment 52, I'm requesting Aurora uplift first, and then Beta if that goes well.
Blocks: 950436
As I pointed out earlier, FF40 still had some issues for me, even though it wasn't as bad as FF41. So it would be nice if this fix got applied to 42 and 41.
Heh, from talking to Tim Nguyen, ironically we recently removed the CSS variable usage for tab-separator.png because it caused other regressions. That happened in bug 1198236, which got folded into bug 1189212, which got uplifted to Aurora and Beta.

Jean-Marc, can you still reproduce the problem on Beta? If not, then I don't think we necessarily need to uplift this patch to Beta.

We *should* still uplift to Aurora, though, as there are other uses of CSS variables in the UI.
(In reply to Jean-Marc Valin (:jmspeex) from comment #64)
> As I pointed out earlier, FF40 still had some issues for me, even though it
> wasn't as bad as FF41. So it would be nice if this fix got applied to 42 and
> 41.

Hmm, so that suggests that even before the tab-separator.png change, you were still noticing issues... That might argue that we should uplift this patch to beta.
Flags: needinfo?(jmvalin)
(In reply to Seth Fowler [:seth] [:s2h] from comment #66)
> Hmm, so that suggests that even before the tab-separator.png change, you
> were still noticing issues... That might argue that we should uplift this
> patch to beta.

No, wait, my mistake. Looks like the tab-separator.png change landed in 40, so that makes sense.
OK, how about I switch back to aurora and you guys uplift whatever you want me to run/test there?
Flags: needinfo?(jmvalin)
(In reply to Jean-Marc Valin (:jmspeex) from comment #68)
> OK, how about I switch back to aurora and you guys uplift whatever you want
> me to run/test there?

It'd be enough to just check today's Aurora and see whether you can reproduce the problem there. Is Aurora OK right now?
Flags: needinfo?(jmvalin)
Did anything just change in Aurora? I was using it until august 27th (up-to-date version) and it still had the bug?
Flags: needinfo?(jmvalin)
(In reply to Jean-Marc Valin (:jmspeex) from comment #70)
> Did anything just change in Aurora? I was using it until august 27th
> (up-to-date version) and it still had the bug?

Yeah, bug 1189212 landed in Aurora on August 28th, so unfortunately you'd need to re-test now.
Comment on attachment 8652659 [details] [diff] [review]
(Part 1) - Make it possible to use nsGenericHashKey with free functions

As Sylvestre alluded, the sooner we uplift to Aurora42 and stabilize, we will feel more confident taking it into Beta41. This patch looks straightforward.
Attachment #8652659 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8652661 [details] [diff] [review]
(Part 2) - Make nsCSSProperty hashable

Aurora42+
Attachment #8652661 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8653209 [details] [diff] [review]
(Part 3) - Add CSSVariableImageTable and use it to store ImageValues generated by CSS variables

Aurora42+
Attachment #8653209 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Seth, given that this is a bad regression (I have run into it myself a few times), is it easy to automated test(s) for this issue so we can catch it sooner? Just wondering.
Flags: needinfo?(seth)
Component: General → CSS Parsing and Computation
Product: Firefox → Core
Target Milestone: Firefox 43 → ---
Severity: normal → major
Keywords: perf
OS: Linux → All
Hardware: x86_64 → All
Been running Aurora 2015-08-31 for a bit more than a day without experiencing slowdown, so it looks like it's OK now. I'll keep running Aurora (that's what I run normally) and report any slowdown I experience. Thanks for resolving this.
(In reply to Ritu Kothari (:ritu) from comment #75)
> Seth, given that this is a bad regression (I have run into it myself a few
> times), is it easy to automated test(s) for this issue so we can catch it
> sooner? Just wondering.

It's really hard for me to do. This is a performance problem, really, so this is more in the area of the Talos folks. Joel, it seems like we don't have an existing test that would catch this; can we add one?
Flags: needinfo?(seth) → needinfo?(jmaher)
(In reply to Jean-Marc Valin (:jmspeex) from comment #76)
> Been running Aurora 2015-08-31 for a bit more than a day without
> experiencing slowdown, so it looks like it's OK now. I'll keep running
> Aurora (that's what I run normally) and report any slowdown I experience.
> Thanks for resolving this.

Thanks for verifying that things look good on Aurora. I really appreciate you reporting this and then pushing on it until it got fixed. This was a really big deal, and I'm glad we managed to avoid shipping it!
Joel, just to clarify: we'd catch problems like these with a test that repeated the same UI action (in this case, opening and closing a tab) many times. In this bug, every time you opened or closed a tab, things got a little slower, which added up to huge delays over time.
interesting bug and I am glad that :jmspeex could help pinpoint the problem!

Regarding opening/closing tabs- we have TART which does a lot of tab opening/closing to measure the animation of the tabs.  This is done quickly- maybe we should ensure that we hit the proper number of open tabs.  We have another test tps which had 51 tabs with real pages (cached copies) loaded.  We load the tabs and switch between them.

I am open to ideas of making TART or tps do what we need, maybe even creating a new test.  The key would be is what would we measure?
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #80)
> I am open to ideas of making TART or tps do what we need, maybe even
> creating a new test.  The key would be is what would we measure?

So a TART-like test sounds like it would reproduce the problem in this bug if it continued for long enough. I suspect the problem is that issues like we have in this bug take a long time to become obvious from timing measurements, and the tests don't continue opening and closing tabs for long enough. So one option would be to make a TART-like test that just continues to open and close tabs for much longer - maybe thousands of times.

We might be able to catch the problem with less iteration just by measuring the time it takes to open and close each individual tab and trying to detect if the delays are gradually increasing. I'm not sure how feasible that kind of analysis is within the framework of Talos, though.
One issue is that I've never been able to figure out exactly what makes this bug trigger. In the past I've tried opening and closing a large number of tabs, or opening up many pages and I couldn't really make it trigger faster. There *seemed* to be a correlation with the number of open tabs and the use of pages with lots of JS, but it's hard to say more. The bug was also acting in a weird way, where by closing tabs down to e.g. 9 tabs (that number would be different from one run to another) would make FF fast again, and then just opening up a 10th tab again would make it slow (and closing again would make it fast). Overall, I've not been able to extract any 100% repeatable pattern for it despite trying hard. Maybe now that the bug is understood, its would be easier?
BTW, I'm still running the 2015-08-31 aurora without any speed problem (have not restarted it once). I think it's safe to say it's no longer affected. Will now restart to track more recent builds.
avi, could you weigh in regarding comment 81 on if we could realistically run tart much longer and determine if we had a slowdown in new tab creation times?
Flags: needinfo?(avihpit)
I haven't tried running TART for so long, but I don't see any inherent limitation within TART itself. The number of repetitions (currently 25 'tppagecycles') is configured outside of TART, and each iteration lasts roughly 30s, overall ~12 mins. We could use instead 250 iterations which would run for roughly two hours.

However, I see two main issues with this:

1. All of our tests run for few minutes at most, so making one test run for two hours could.. use much resources and just take long to complete, possibly making some analysis tasks harder.

2. We don't currently have metrics which measure performance change over time, so we'd need to come with a good one for this. We could use the current metric (median) and it could also reflect possibly higher average due to longer run time, but what we're really after here is how are the last iterations compare with the first ones. Or maybe do linear regression and report the slope, etc.


But even if we don't make this change as part of talos, nothing prevents anyone from running TART locally for two hours, either via talos or plainly as an addon. The number of iterations is configurable and the data (of each iteration and the overall summary) is displayed at the browser once it's done.
Flags: needinfo?(avihpit)
Also, everything I said at comment 85 would apply equally well IMO to most other talos tests. If Firefox performance regress over time, then I don't see why TART would reflect such change better than other tests we have, and the number of iteration is configurable for all tests.
Seth, is this ready for uplift to Beta? I hope so since we gtb Beta9 on Thursday.
Flags: needinfo?(seth)
(In reply to Jean-Marc Valin (:jmspeex) from comment #82)
> One issue is that I've never been able to figure out exactly what makes this
> bug trigger. In the past I've tried opening and closing a large number of
> tabs, or opening up many pages and I couldn't really make it trigger faster.
> There *seemed* to be a correlation with the number of open tabs and the use
> of pages with lots of JS, but it's hard to say more. The bug was also acting
> in a weird way, where by closing tabs down to e.g. 9 tabs (that number would
> be different from one run to another) would make FF fast again, and then
> just opening up a 10th tab again would make it slow (and closing again would
> make it fast). Overall, I've not been able to extract any 100% repeatable
> pattern for it despite trying hard. Maybe now that the bug is understood,
> its would be easier?

I'm not really sure why some of the weird behavior you're describing happened; perhaps there are additional bugs here. The issue I fixed could be reproduced easily just by printing the number of observers each image has every time an observer is added or removed. For some of the UI images, it only took a few seconds of opening and closing tabs to get over a hundred observers. The number of observers had to get much higher before you could really see the slowness in the UI, but by watching the observers you could see the problem straight away.
Flags: needinfo?(seth)
(In reply to Avi Halachmi (:avih) from comment #86)
> Also, everything I said at comment 85 would apply equally well IMO to most
> other talos tests. If Firefox performance regress over time, then I don't
> see why TART would reflect such change better than other tests we have, and
> the number of iteration is configurable for all tests.

I'm not sure whether it'd actually be necessary to run for two hours. If we could report a potential issue if each iteration seems to be taking longer than the previous one, we might've caught this issue a lot faster.
(In reply to Ritu Kothari (:ritu) from comment #87)
> Seth, is this ready for uplift to Beta? I hope so since we gtb Beta9 on
> Thursday.

Since I don't see any evidence in this bug that this ever landed on Aurora, I don't know that it's a good idea at this point.

Comment 76 suggests that bug 1189212, which has landed on Aurora and Beta, has patched over the immediate problem. Given that that's the case, we're probably better off just letting this ride the trains at this point.
(In reply to Seth Fowler [:seth] [:s2h] from comment #89)
> I'm not sure whether it'd actually be necessary to run for two hours. If we
> could report a potential issue if each iteration seems to be taking longer
> than the previous one, we might've caught this issue a lot faster.

The data is typically too noisy to report such value and compare it between runs IMO.

However, I did try to analyze the TART data from cset 07ff9769e2023dcc8a07f5e7df0febb05c82a446 for win7 opt over 3 subtests.

Each value represents the average interval between animation frames over a tab open animation (so lower is better), where "simple" is of opening a new tab of about:blank, preload-no is opening about:newtab without newtab preload, and preload-yes - with newtab preload.

The values are ordered. Each TART run does all subtests, so at the list below the first values at each line happen at the beginning of the run, and the last values at each line happen ~10 minutes later:

> simple-open-DPI1.all,4.000308289247401,4.003781195949106,3.9894151862929847,3.9899984442669414,4.497053534753861,3.92294328212738,4.002324293641483,3.9568356258281763,4.030819191861508,4.05838094896345,4.101187965763149,4.063212401831328,4.072216311497475,4.084796489174686,4.216088933211107,4.101262508933224,4.058850838857539,4.087289575320571,4.151853843168779,4.105027956749076,4.138582663102583,4.01950811287936,3.999008510423743,4.069367857121709,4.103130899258514
> newtab-open-preload-no.all,10.68524012198815,6.285581025210294,6.251387850804762,6.1175624900394014,6.235190153121948,6.083258734809028,6.239414057948372,6.121368334028456,19.643543362617493,6.296324236826464,6.363682724708735,6.415531236071919,6.3925896633503045,6.611865554537092,6.253456175327301,6.236377017064528,6.289682838049802,6.452752057896104,6.41246810070304,6.306491402062503,6.457922691522643,6.2774821086363355,6.45608309812324,6.376220625500347,6.757049514026177
> newtab-open-preload-yes.all,5.928750571997269,6.304580146616155,6.260263426737352,6.143392647637262,6.283294493501836,6.146677917904324,6.287780290300196,6.226288443261927,6.143350018395318,6.38499010995377,6.432802100514257,6.704554429868373,6.45281457901001,6.22164555571296,6.407610144726066,6.445545457130255,6.461816166722497,6.571518494969323,6.596277185848781,6.4235326301219855,6.789080503510266,6.437936965809312,6.54059762614114,6.297816596247933,6.6009307986214045

After graphing those, it seems to me that:
- preload-no is super flat except for what appears to be a single outlier.
- simple: relatively noisy, not sure I can see a trend clearly over these specific values.
- preload-yes: noisy, but it does appear to me to have a trend - worsening over time.

Overall, I'd say this direction (of analyzing trends within a single talos test run) is worth exploring further.
Attached image tart-run-trends.jpg
Graphs for the data in comment 91: individual sub tests, and all of them on the same graph with same scale.
Wes, could you please uplift to Aurora? If it looks good, we may consider taking a fix in RC for this.
Flags: needinfo?(wkocher)
Seth, I was told that this got missed from Aurora uplifts because the target milestone was changed by Dao. Anyway, I think it still makes sense to take this in Aurora42. What do you think?
Flags: needinfo?(seth)
(In reply to Avi Halachmi (:avih) from comment #91)
> After graphing those, it seems to me that:
> - preload-no is super flat except for what appears to be a single outlier.
> - simple: relatively noisy, not sure I can see a trend clearly over these
> specific values.
> - preload-yes: noisy, but it does appear to me to have a trend - worsening
> over time.

Interesting results. I don't seem to have that revision locally so I can't compare; hopefully that revision includes neither the patches in this bug, nor the changes made in bug 1189212. I'm surprised that preload-no vs preload-yes matters at all...
Flags: needinfo?(seth)
(In reply to Ritu Kothari (:ritu) from comment #94)
> Seth, I was told that this got missed from Aurora uplifts because the target
> milestone was changed by Dao. Anyway, I think it still makes sense to take
> this in Aurora42. What do you think?

Sure, I think taking it in Aurora is relatively low risk. Just didn't want to push to Beta without testing it out in Aurora first.
(In reply to Seth Fowler [:seth] [:s2h] from comment #95)
> Interesting results. I don't seem to have that revision locally so I can't
> compare; hopefully that revision includes neither the patches in this bug,
> nor the changes made in bug 1189212. I'm surprised that preload-no vs
> preload-yes matters at all...

I'm not sure there's a real difference TBH, look at my attachment at comment 92. I think the outlier at preload-no increased the range of the graph thus making the change over time much smaller at the image space.

The preload-yes and preload-no look pretty much overlapping when displayed on the same graph (same attachment), so we can guess that without the outlier, preload-no might have exhibited similar regression over time.

Like I said, it looks interesting, and IMO it's worth exploring this kind of analysis approach further, though I don't have the bandwidth currently to handle it myself.
(In reply to Wes Kocher (:KWierso) from comment #99)
> Backed out in
> https://hg.mozilla.org/releases/mozilla-aurora/rev/ada03589e5ea to see if it
> fixes the Marionette bustage:
> https://treeherder.mozilla.org/logviewer.html#?job_id=1131962&repo=mozilla-
> aurora

The backout Mn jobs were green.

Interestingly, the Linux PGO jobs on the original landing were green, too...
Just FYI that two of us are experiencing a significant slow down on Release build too. Running 40.0.3 on W10 and on MacBook Air on latest OS. 

I was working in a Google Doc for some time and then suddenly everything became really slow. Probably over 1 sec delay on any action. I had over 10 tabs open.
(In reply to Fabio Rios [:frios] from comment #101)
> Just FYI that two of us are experiencing a significant slow down on Release
> build too. Running 40.0.3 on W10 and on MacBook Air on latest OS. 
> 
> I was working in a Google Doc for some time and then suddenly everything
> became really slow. Probably over 1 sec delay on any action. I had over 10
> tabs open.

Can you reproduce this on Nightly?
Flags: needinfo?(seth) → needinfo?(frios)
(In reply to Wes Kocher (:KWierso) from comment #99)
> Backed out in
> https://hg.mozilla.org/releases/mozilla-aurora/rev/ada03589e5ea to see if it
> fixes the Marionette bustage:
> https://treeherder.mozilla.org/logviewer.html#?job_id=1131962&repo=mozilla-
> aurora

This suggests that we probably shouldn't take the risk of uplifting this unless there's evidence of a different regression. Since bug 1189212 was already uplifted, the regression stemming from the use of CSS variables in the UI is fixed.

Comment 101 is interesting, though; if it turns out that Google Docs triggers the same issue, that's quite significant. (FWIW, though, at this time I don't see any evidence to suggest that it's the same issue rather than a completely unrelated one.)
Based on comment 90 and my own use, I think the problem may not be so severe anymore. Also, given that this fix was backed out from Aurora, this fix is not safe to uplift to Beta41. I agree with Seth's comment 103. This bug is now a wontfix for 41.
Depends on: 1205486
Flags: needinfo?(frios)
Seth, do you have plans for 42? Thanks
Flags: needinfo?(seth)
(In reply to Sylvestre Ledru [:sylvestre] from comment #105)
> Seth, do you have plans for 42? Thanks

The changes to frontend CSS that made this necessary were backed out of 42, so I don't think we need to uplift this to 42 at this time. 43 will be the first release that both uses CSS variables for images in the frontend, and contains this fix, so it's fine to just let it ride the trains.

If we do get reports of users running into this issue on 42, though, we can consider uplifting.
Flags: needinfo?(seth)
OK, thanks.
If I understand correctly, 42 is unaffected by this bug.
You need to log in before you can comment on or make changes to this bug.