Mark a content process as troubled if it has a pattern of jank

NEW
Unassigned

Status

()

defect
P2
normal
2 years ago
2 years ago

People

(Reporter: benjamin, Unassigned)

Tracking

Trunk
mozilla57
Unspecified
All
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [e10s-multi:+])

User Story

A content process should be marked as troubled in any of the following circumstances:

* input latency of >500ms is detected
* a GC or CC slice takes 250ms past expected/idle time (this is not the total GC/CC slice time: see bug 1373292)
* two consecutive memory measurements list non-zero ghost windows

All time measurements should exclude the following situations:
* time during startup until the content process is running stably
* delays due to the computer sleeping

Telemetry should record and monitor the rate at which a content process is marked as troubled.
In order to reduce the impact of jank, we should try and detect when a content process is experiencing problems and stop using it. The "stop using it" mechanism already exists via ContentParent::MarkAsTroubled, and we use it currently for low-memory notifications.

I'm taking a SWAG at a specification of this, but I'd like some GC/CC experts as well as input latency experts to review and suggest a better specification.
Requesting feedback especially on the user story field.
Flags: needinfo?(jcoppeard)
Flags: needinfo?(continuation)
Flags: needinfo?(bugs)
Risks:
user janks so much that we are continually marking processes as troubled, this could end up with a bunch of processes (memory usage risk), or slowing pageload from launching new processes frequently
250ms sounds reasonable to me.

This isn't related to what you asked, but with regards to comment 2, I'd think offhand you wouldn't exceed the standard process count even if all of the processes are marked troubled. That seems like it would be a recipe for problems if the user frequently visits a site that causes ghost windows, or what have you.
Flags: needinfo?(continuation)
Do we ever mark troubled processes back to untroubled?  Seems like that could be pretty common if the user closes the bad tab, etc.
Whiteboard: [e10s-multi:?]
When a process is marked troubled it's removed from the available processes, so it no longer counts toward the standard process limit.

Currently there is no way to un-trouble a process. I'm not opposed, but I'd hope for a better signal that the problem was caused by a particular problem/tab/page and that's now resolved.
Do we have a bug tracking "interesting things we can do with content processes [once we can move tabs between them in the not-so-near future]"? Maybe this could be done before then, but ideally we'd move existing content out of a process once we knew it was troubled.
(In reply to Andrew Overholt [:overholt] from comment #6)
> Do we have a bug tracking "interesting things we can do with content
> processes [once we can move tabs between them in the not-so-near future]"?
> Maybe this could be done before then, but ideally we'd move existing content
> out of a process once we knew it was troubled.

I have a doc where I'm trying to organise the future e10s-multi related items, I'm going to add
there for now. Needinfo-ing myself to file the related metabug too. Probably in a couple of weeks
I'll get there...
Flags: needinfo?(gkrizsanits)
Moving tabs sounds good! But I don't think it should block shorter-term things we can accomplish in the meantime.
Duplicate of this bug: 1386290
Flags: needinfo?(jcoppeard)
Flags: needinfo?(bugs)
OS: Unspecified → All
Priority: -- → P1
Whiteboard: [e10s-multi:?] → [e10s-multi:+]
Target Milestone: --- → mozilla57
Version: unspecified → Trunk
(In reply to Andrew Overholt [:overholt] from comment #6)
> "interesting things we can do with content
> processes [once we can move tabs between them in the not-so-near future]"?

Let's focus on short term solutions here in this bug, I've added some notes to my planning docs, but I feel like we're still quite far from moving windows between processes.
Flags: needinfo?(gkrizsanits)
Moving the tracking nom from bug 1386290 to this one.
Still unassigned though it is marked P1 from a couple of months ago.  
It looks to me like this should be a different priority and probably not going to make it to 57. 
Jim, Bill, what do you think?
Flags: needinfo?(wmccloskey)
Flags: needinfo?(jmathies)
This bug is related to development work. This shouldn't track a release. Cleaning up.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(jmathies)
Priority: P1 → P2
You need to log in before you can comment on or make changes to this bug.