Closed Bug 1341437 Opened 7 years ago Closed 7 years ago

Activity Streams causes a lot of UI jank

Categories

(Firefox :: New Tab Page, defect, P4)

defect

Tracking

()

RESOLVED FIXED
Firefox 57
Performance Impact none
Tracking Status
firefox57 --- fixed
firefox58 --- affected

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 1 open bug)

Details

I have been seeing a lot of work happening on the parent process's main thread that are triggered by activity streams code.

Here is an example profile of my browsing session: https://perfht.ml/2l59ola

Here, the activity streams code is overall running for 2.4 *seconds*.  This is many many separate chunks, a lot of them taking much longer than a frame budget.

The top offenders are:
 - parseHTMLText from MetadataParser.js:40 (872ms)
 - _processLinks from PlacesProvider.js:686 (247ms)
 - asyncGetEnhancedLink from PreviewProvider.js:175 (292ms)
 - _executePlacesQuery from PlacesProvider.js:788 (226ms)
 - ColorAnalyzer_onImageLoad from ColorAnalyzer.js:41 (143ms)
 - asyncGetMetadataByCacheKey from MetadataStore.js:439 (89ms)

There is probably a lot more information to be found from this profile, I only spent a few minutes looking at it.
BTW I'm running version 1.4.1 of the add-on (updated Feb 15, 2017).
Thanks for doing the profiling. Do you remember how many and what pages you loaded? We started looking at some talos results, and it seemed like some large svg page tests were causing problems with the metadata parsing.
(In reply to Ed Lee :Mardak from comment #2)
> Thanks for doing the profiling. Do you remember how many and what pages you
> loaded?

It's hard to say, I grabbed this profile from my real profile and I constantly open and close pages so I can't say any more.

I suggest profiling on your own as well to get a better sense of what problems we have with how many and what kinds of pages.

> We started looking at some talos results, and it seemed like some
> large svg page tests were causing problems with the metadata parsing.

I'm not sure what kind of metadata is being parsed and why it's being done this way, but calling parseHTMLText() like the code currently does is never going to be fast since it does a synchronous parse.  <https://github.com/mozilla/activity-stream/blob/e7e3238f79e330003b064f535ee6739bfb1cbec7/addon/MetadataParser.js#L17>

Another example, have queries like this really been optimized to ensure that they run fast with a typical user's places database?  <https://github.com/mozilla/activity-stream/blob/e7e3238f79e330003b064f535ee6739bfb1cbec7/addon/PlacesProvider.js#L367>  Things like that need to be vetted by the current mozStorage peers.

Another example, this code <https://github.com/mozilla/activity-stream/blob/e7e3238f79e330003b064f535ee6739bfb1cbec7/addon/PlacesProvider.js#L663> is going to be inherently slow when the favicon byte size is large.

Another one, this code <https://github.com/mozilla/activity-stream/blob/e7e3238f79e330003b064f535ee6739bfb1cbec7/addon/ColorAnalyzer.js#L46> is probably going to be slow for large images, and those are pretty common on the Web.  This kind of thing should really be done in C++, and hopefully when we do the normal work of decoding the image for displaying.

So I think at least in some cases we're doing work that is inherently slow on the main thread, and we need to rewrite the code to avoid that.

I don't think Talos is a particularly useful tool to measure the impact of Activity Streams, since for example it won't have a profile with a sizeable user history, so the performance issues that result from the places queries won't show up there.
Sounds like we must ensure to not ship Activity Streams before performance issues are fixed.
(In reply to :Ehsan Akhgari from comment #3)
> I don't think Talos is a particularly useful tool to measure the impact of
> Activity Streams, since for example it won't have a profile with a sizeable
> user history, so the performance issues that result from the places queries
> won't show up there.

Would it make sense to have perf tests specific to this feature that use a test profile loaded with history?

Besides preventing regressions, this would give the team working on this feature a more specific performance target to aim for in the initial landing.
(In reply to Robert Helmer [:rhelmer] from comment #5)
> (In reply to :Ehsan Akhgari from comment #3)
> > I don't think Talos is a particularly useful tool to measure the impact of
> > Activity Streams, since for example it won't have a profile with a sizeable
> > user history, so the performance issues that result from the places queries
> > won't show up there.
> 
> Would it make sense to have perf tests specific to this feature that use a
> test profile loaded with history?

That will fix the problem I mentioned above (let's pretend for 1 second that we know what specific preloaded history is a good test case for this feature which is extremely hard) but not other problems.  For example, Talos probably never loads a new tab page in a profile that we load pages into (such as the Tp5 test.)   I don't know the full list of user interactions that Activity Streams is adding but I bet Talos doesn't cover all (most?) of them either.

> Besides preventing regressions, this would give the team working on this
> feature a more specific performance target to aim for in the initial landing.

We don't really need Talos for solid performance targets, there are many other ways to define them.  For example:

* Never do blocking work on the parent process's main thread.
* Never do work that can be expensive depending on things that Web content can control (such as images or HTML pages which can be arbitrary large) on any main thread, neither the parent nor the content process.
* Don't run unoptimized SQlite queries.
* Don't repeat work around what the browser already does (such as parsing HTML).
* Provide a detailed analysis of the performance cost of this code using a sampling profiler such as https://perf-html.io from real world usage.  (For example, profiling a browser with this extension should show very little if any additional work on the parent process' main thread coming from the Activity Streams extension.)
* Provide telemetry probes where the manual analysis is impractical or where we expect variance across the real world user population.

This is basically what we do for everything else in the browser.

Talos can perhaps prove helpful in doing some regression testing, but I suspect we may need to create new Talos tests that specifically exercise the paths we're worried about regressing in the future.  IOW, Talos is a good tool to ensure we keep being fast once we get there, not that useful for becoming fast in the first place.

Note that there are also security implications to some of the existing practices (such as parsing untrusted code in the parent process) which I didn't get into mostly because the fix to those issues is similar to the fix to the performance issues.)
(In reply to Robert Helmer [:rhelmer] from comment #5)
> Would it make sense to have perf tests specific to this feature that use a
> test profile loaded with history?

+1 to that. Although I'll note that when I was initially testing startup
performance for this, I was testing in a skeleton profile with very little
history, and the performance impact was still fairly dramatic.

(In reply to :Ehsan Akhgari from comment #6)
> That will fix the problem I mentioned above (let's pretend for 1 second that
> we know what specific preloaded history is a good test case for this feature
> which is extremely hard) but not other problems.  For example, Talos
> probably never loads a new tab page in a profile that we load pages into
> (such as the Tp5 test.)   I don't know the full list of user interactions
> that Activity Streams is adding but I bet Talos doesn't cover all (most?) of
> them either.

I think it would still probably be useful to make sure talos tests are running
with the kind of realistic profile data that we'd expect to affect Activity
Stream's performance. That would at least increase the chances of it catching
regressions sooner rather than later.
(In reply to Kris Maglione [:kmag] from comment #7)
> (In reply to :Ehsan Akhgari from comment #6)
> > That will fix the problem I mentioned above (let's pretend for 1 second that
> > we know what specific preloaded history is a good test case for this feature
> > which is extremely hard) but not other problems.  For example, Talos
> > probably never loads a new tab page in a profile that we load pages into
> > (such as the Tp5 test.)   I don't know the full list of user interactions
> > that Activity Streams is adding but I bet Talos doesn't cover all (most?) of
> > them either.
> 
> I think it would still probably be useful to make sure talos tests are
> running
> with the kind of realistic profile data that we'd expect to affect Activity
> Stream's performance. That would at least increase the chances of it catching
> regressions sooner rather than later.

Sure, I agree that is valuable for dealing with regressions.  But I still don't think that's a very useful tool for the development stage.  The fact that the Talos tests in comment 2 didn't uncover any of these issues makes me believe that for whatever reason Talos tests right now don't exercise enough of this code.
Another thing that I have noticed testing the Activity Streams add-on is that it has a pretty terrible impact on the session restore performance.  My main browsing profile has hundreds of tabs in several windows and when restoring that session at startup, we spend around 3 seconds only running Activity Streams code on the UI main thread in this profile <https://perfht.ml/2lqTkdC>.  I haven't looked into this one in detail because it seems like many of the problems in this one are the same as the previous profile, so it may very well be that there is nothing specific to session restore here, just that we're doing a lot of work and session restore is already too slow, but I thought I'd mention it here to make sure people test it explicitly.
Whiteboard: [qf:p1]
Flags: needinfo?(tspurway)
I'm marking this qf- because we've had a good conversation with the Activity Stream folks and it's not useful to track this as part of QF right now.
Whiteboard: [qf:p1] → [qf-]
I'm changing the status of this since we are addressing this with the system add-on rewrite, and we will be closing monitoring performance as land each part.
Priority: -- → P3
Priority: P3 → P4
Flags: needinfo?(tspurway)
Component: Activity Streams: General → Activity Streams: Newtab
I am closing this bug, as it appears to be about the Test Pilot add-on, which has been rewritten and these issues were addressed in the rewrite
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
Component: Activity Streams: Newtab → New Tab Page
Performance Impact: --- → -
Whiteboard: [qf-]
You need to log in before you can comment on or make changes to this bug.