Closed Bug 1367339 Opened 7 years ago Closed 6 years ago

[Input Latency] Firefox is 60% (33ms) slower than Chrome in case Facebook - click to open emoji dialog in chat tab

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
Performance Impact medium

People

(Reporter: mlien, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: perf, Whiteboard: [QRC][QRC_Analyzed])

User Story

STR:
0. Logged in with Whitehat account, timeline page
1. Click on a name in the chat panel on the right, Wait for the chat window to open
2. Click on the emoji icon and wait dialog show up

Gecko Profiles:
1. https://perfht.ml/2qK4Ztp 

Short Gecko Profile(cover only Input Lantency action):
https://perfht.ml/2riq7qY

Reports: https://goo.gl/MqNqwL
Notes: https://docs.google.com/spreadsheets/d/1MsTK1FW88wuLd25A18HG2KqpjYJnvrxGM9a6MeZp35w/edit#gid=256706414
      No description provided.
Whiteboard: [qf] → [qf][QRC_NeedAnalysis]
User Story: (updated)
Ready for QRC profile analysis but not yet ready for QF triage.
Whiteboard: [qf][QRC_NeedAnalysis] → [QRC][QRC_NeedAnalysis]
Summary: [Input Latency] Facebook - click to open emoji dialog in chat tab → [Input Latency] Firefox is 60% (33ms) slower than Chrome in case Facebook - click to open emoji dialog in chat tab
Perfherder [1] shows this as high confidence that Firefox is 51% slower in last 14 days. Worth to dig into the issue.

[1] https://treeherder.mozilla.org/perf.html#/dashboard?topic=hasal&repo=mozilla-central&timerange=1209600
Ray, could you help to profile bug?
Assignee: nobody → ralin
Flags: needinfo?(ralin)
Sure, I'll see what I can do. Thanks.
Flags: needinfo?(ralin)
Sorry for the late. By looking into the profiles, it seems a series of JS callback in content cause the event processing delayed. I didn't see other gecko calls within the stack(in most of the delayed periods), but I'm stuck with the obfuscated code that stops me from seeing more information inside it. It's hard for me to infer which js function call exactly taking more time, perhaps Sean might know better how to diagnose this issue. Thanks.
Flags: needinfo?(sstangl)
I'm not sure how to tell what is taking more time now relative to a historical measurement, since Facebook has changed in the meantime.

However, looking at the inverted JS code, the only thing that stands out to me as obviously bad is that ~10% of time is spent in what appears to be something called the "CavalryLogger", which appears to sample browser status information and report it back to Facebook.
Err, the file appears to have changed... it now only has 18 total lines.

So that isn't so helpful, and we need to re-measure.
Mark, please help Sean for comment 8. Thanks.
Flags: needinfo?(myen)
(In reply to Sean Stangl [:sstangl] from comment #8)
> Err, the file appears to have changed... it now only has 18 total lines.
> 
> So that isn't so helpful, and we need to re-measure.

Hi Sean, I have remeasured this case on the reference device.

Short Gecko Profile(cover only Input Lantency action):
https://perfht.ml/2twhLty
Flags: needinfo?(myen)
Jonathan, can you please do a profile analysis on Mike's updated profiler on comment 10.
Assignee: ralin → jfkthame
Flags: needinfo?(jfkthame)
A couple of things in this profile look interesting. First, it's clear that there is a LOT of javascript going on, which of course is largely out of our control (but anything we do that make JS faster will help).

There are a few long-ish Composite operations, the longest at 57ms. But AFAICS it doesn't look like the compositor itself is doing much here; my suspicion, rather, is that the GPU process may be getting starved because the system is too busy in other ways.

In the middle of the profile, there's a DOMEvent message that apparently takes 746ms(!) to handle. Focusing on this, I see the content process running lots of JS (including a substantial chunk at the end of the interval that is apparently interpreted rather than JIT'ed); but it's perhaps more interesting that the main (chrome) process is also quite busy during this interval. It seems to be doing network activity, and in particular I see net::predictor in a lot of stacks.

So this profile leads me to wonder whether we're doing "too much" prefetching here (for the capabilities of the given machine -- presumably a higher-perf machine with more cores, etc., can do it without problems), with the result that we're starving the content process of the cycles it needs to run JS in a timely manner (as well as perhaps starving the GPU process).

(Unassigning, as this isn't an area I am at all familiar with.)
Assignee: jfkthame → nobody
Flags: needinfo?(jfkthame)
It's hard to say Javascript or DOM per comment 12. Set Component as DOM for triage.

However, based on historical numbers on Perfherder, the gap is 20-30ms between Firefox and Chrome. Not sure whether it's worth to spend efforts or not. Please help to triage. Thanks. 

Win8: https://treeherder.mozilla.org/perf.html#/graphs?timerange=5184000&series=%5Bmozilla-central,df3dfa8e51cbf93d46c4255a85483a4c6b9ad65b,1,9%5D&series=%5Bmozilla-central,179771eff5b2ede5aa2d9431921d22ae2f8aa036,1,9%5D

Win10: https://treeherder.mozilla.org/perf.html#/graphs?series=%5B%22mozilla-central%22,%22ac1fb385ad5752c63f5e9738637a4ed4607f755a%22,1,9%5D&series=%5B%22mozilla-central%22,%22544ca251a3b363188f21d44f960b8537c7201861%22,1,9%5D
Component: General → DOM
Flags: needinfo?(overholt)
Flags: needinfo?(nihsanullah)
Whiteboard: [QRC][QRC_NeedAnalysis] → [qf][QRC][QRC_Analyzed]
(In reply to Bobby Chien [:bchien] from comment #13)
> It's hard to say Javascript or DOM per comment 12. Set Component as DOM for
> triage.
> 
> However, based on historical numbers on Perfherder, the gap is 20-30ms
> between Firefox and Chrome. Not sure whether it's worth to spend efforts or
> not. Please help to triage. Thanks. 
> 
> Win8:
> https://treeherder.mozilla.org/perf.html#/
> graphs?timerange=5184000&series=%5Bmozilla-central,
> df3dfa8e51cbf93d46c4255a85483a4c6b9ad65b,1,9%5D&series=%5Bmozilla-central,
> 179771eff5b2ede5aa2d9431921d22ae2f8aa036,1,9%5D
> 
> Win10:
> https://treeherder.mozilla.org/perf.html#/graphs?series=%5B%22mozilla-
> central%22,%22ac1fb385ad5752c63f5e9738637a4ed4607f755a%22,1,
> 9%5D&series=%5B%22mozilla-central%22,
> %22544ca251a3b363188f21d44f960b8537c7201861%22,1,9%5D

According to comment 12, it seems we need someone from necko team to chime in as comment 12 pointed out "net::predictor" in a lot of stacks and wondering whether we're doing too much "prefetching." That's at necko's wheelhouse.
Was thinking about NI Nicholas for comment 12 and comment 14 but he is not accepting needinfo at the moment. NI Jason instead :)
Flags: needinfo?(jduell.mcbugs)
(In reply to Bobby Chien [:bchien] from comment #13)
> It's hard to say Javascript or DOM per comment 12. Set Component as DOM for
> triage.
> 
> However, based on historical numbers on Perfherder, the gap is 20-30ms
> between Firefox and Chrome. Not sure whether it's worth to spend efforts or
> not. Please help to triage. Thanks. 

What % is the 20-30 ms? Regardless, 20-30 ms doesn't seem worth it to me.
Flags: needinfo?(overholt)
I don't see any mention of the predictor in any of the profiles linked to in this bug (unless I missed one in my scan of the bug). Do we have one?
Flags: needinfo?(jduell.mcbugs) → needinfo?(htsai)
(In reply to Jonathan Kew (:jfkthame) from comment #18)
> The profile from comment 10 was where I noticed it, see
> https://perf-html.io/public/a88dcee0f36f393ba73bccdbf64015908316f1f0/
> calltree/?search=net%3A%3Apredictor

There is definitely a lot of URI parsing happening there, worth filing a bug for it: bug 1390274
Depends on: 1390274
Flags: needinfo?(nihsanullah)
Whiteboard: [qf][QRC][QRC_Analyzed] → [qf:p2][QRC][QRC_Analyzed]
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #17)
> I don't see any mention of the predictor in any of the profiles linked to in
> this bug (unless I missed one in my scan of the bug). Do we have one?

Jonathan, could you please point out the profile you analyzed in comment 12 to Nicholas, thanks?
Flags: needinfo?(htsai) → needinfo?(jfkthame)
(In reply to Andrew Overholt [:overholt] from comment #16)
> (In reply to Bobby Chien [:bchien] from comment #13)
> > It's hard to say Javascript or DOM per comment 12. Set Component as DOM for
> > triage.
> > 
> > However, based on historical numbers on Perfherder, the gap is 20-30ms
> > between Firefox and Chrome. Not sure whether it's worth to spend efforts or
> > not. Please help to triage. Thanks. 
> 
> What % is the 20-30 ms? Regardless, 20-30 ms doesn't seem worth it to me.

Average in last 2 days:
Win7:  Chrome 49.21ms vs Firefox 58.73ms (9.52ms, 19.36%)
Win10: Chrome 20.14ms vs Firefox 53.14ms (33ms, 164%)
Flags: needinfo?(overholt)
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #20)
> Jonathan, could you please point out the profile you analyzed in comment 12
> to Nicholas, thanks?

See comment 18.
Flags: needinfo?(jfkthame)
(In reply to Bobby Chien [:bchien] from comment #21)
> Win10: Chrome 20.14ms vs Firefox 53.14ms (33ms, 164%)

I guess the fact that it's more than twice as long, let's see if Nick can find anything (https://perf-html.io/public/a88dcee0f36f393ba73bccdbf64015908316f1f0/calltree/?search=net%3A%3Apredictor is the profile Jonathan suggests looking at, Nick).
Flags: needinfo?(overholt) → needinfo?(hurley)
That looks like the same stuff as in bug 1390274. Expect more updates for the predictor stuff there in the next couple days.
Flags: needinfo?(hurley)
Keywords: perf
Mike, is this still occurring?
Flags: needinfo?(mlien)
Let's re-open this if/when we see problems here again.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Flags: needinfo?(mlien)
Component: DOM → DOM: Core & HTML
Performance Impact: --- → P2
Whiteboard: [qf:p2][QRC][QRC_Analyzed] → [QRC][QRC_Analyzed]
Flags: needinfo?(sean.stangl)
You need to log in before you can comment on or make changes to this bug.