Closed Bug 1653060 Opened 2 years ago Closed 2 years ago

Intermittent leakcheck | rdd 3544 bytes leaked (AbstractThread, ActorLifecycleProxy, CondVar, IPC::Channel, Mutex, ...)

Categories

(Core :: Audio/Video: Playback, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: jya)

References

Details

(Keywords: intermittent-failure)

Attachments

(6 files)

Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=309830807&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Cnz-C20qSfqV3BlBQP8I0Q/runs/0/artifacts/public/logs/live_backing.log


...
[task 2020-07-15T10:11:42.336Z] 10:11:42     INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, rdd process 148
[task 2020-07-15T10:11:42.337Z] 10:11:42     INFO - 
[task 2020-07-15T10:11:42.338Z] 10:11:42     INFO -      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2020-07-15T10:11:42.338Z] 10:11:42     INFO -      |                                      | Per-Inst   Leaked|   Total      Rem|
[task 2020-07-15T10:11:42.339Z] 10:11:42     INFO -    0 |TOTAL                                 |       23     3544|    5739       16|
[task 2020-07-15T10:11:42.340Z] 10:11:42     INFO -    1 |AbstractThread                        |       40       40|       3        1|
[task 2020-07-15T10:11:42.340Z] 10:11:42     INFO -    2 |ActorLifecycleProxy                   |       32       32|       5        1|
[task 2020-07-15T10:11:42.341Z] 10:11:42     INFO -   14 |CondVar                               |       64      128|      23        2|
[task 2020-07-15T10:11:42.342Z] 10:11:42     INFO -   25 |IPC::Channel                          |        8        8|       4        1|
[task 2020-07-15T10:11:42.342Z] 10:11:42     INFO -   45 |Mutex                                 |       80      240|     277        3|
[task 2020-07-15T10:11:42.343Z] 10:11:42     INFO -   61 |PRemoteDecoderManagerParent           |      688      688|       1        1|
[task 2020-07-15T10:11:42.344Z] 10:11:42     INFO -   74 |RefCountedMonitor                     |      152      152|       4        1|
[task 2020-07-15T10:11:42.344Z] 10:11:42     INFO -   76 |RemoteDecoderManagerParent            |      736     1472|       2        2|
[task 2020-07-15T10:11:42.345Z] 10:11:42     INFO -   91 |StoreRef                              |       16       16|       4        1|
[task 2020-07-15T10:11:42.345Z] 10:11:42     INFO -   97 |TaskQueue                             |      336      336|       2        1|
[task 2020-07-15T10:11:42.346Z] 10:11:42     INFO -  128 |ipc::MessageChannel                   |      392      392|       4        1|
[task 2020-07-15T10:11:42.346Z] 10:11:42     INFO -  129 |ipc::MessageChannel::DispatchOnChannel|       40       40|       4        1|
[task 2020-07-15T10:11:42.346Z] 10:11:42     INFO - 
[task 2020-07-15T10:11:42.346Z] 10:11:42     INFO - nsTraceRefcnt::DumpStatistics: 151 entries
[task 2020-07-15T10:11:42.347Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 AbstractThread
[task 2020-07-15T10:11:42.347Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 ActorLifecycleProxy
[task 2020-07-15T10:11:42.347Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 2 CondVar
[task 2020-07-15T10:11:42.347Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 IPC::Channel
[task 2020-07-15T10:11:42.348Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 3 Mutex
[task 2020-07-15T10:11:42.348Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 PRemoteDecoderManagerParent
[task 2020-07-15T10:11:42.348Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 RefCountedMonitor
[task 2020-07-15T10:11:42.348Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 2 RemoteDecoderManagerParent
[task 2020-07-15T10:11:42.349Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 StoreRef
[task 2020-07-15T10:11:42.349Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 TaskQueue
[task 2020-07-15T10:11:42.349Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 ipc::MessageChannel
[task 2020-07-15T10:11:42.349Z] 10:11:42     INFO - TEST-INFO | leakcheck | rdd leaked 1 ipc::MessageChannel::DispatchOnChannel
[task 2020-07-15T10:11:42.350Z] 10:11:42     INFO - TEST-UNEXPECTED-FAIL | leakcheck | rdd 3544 bytes leaked (AbstractThread, ActorLifecycleProxy, CondVar, IPC::Channel, Mutex, ...)
[task 2020-07-15T10:11:42.350Z] 10:11:42     INFO - 
...

RemoteDecoderManagerParent

Component: DOM: Core & HTML → Audio/Video: Playback
Severity: normal → S3
Priority: P5 → P1
No longer blocks: 1661328
Assignee: nobody → jyavenard
See Also: → 1662781
Pushed by jgilbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/056bbc57ca7c
Increase RDD leak threshold to 4000 for now. r=mccr8

Background information on what is happening here.

The parent process control the RDDParent via the PRDD protocol.
Then we have content processes potentially using a RemoteDecoderManagerChild/Parent via the PRemoteDecoderManager protocol.

When shutting down, the RDDChild will be shutdown when XPCOM shuts down, which will instruct the RDDParent to shutdown the RDD process.

This can happen before each content process have completed shutdown and closed their respective PRemoteDecoderManager channel.
All those RemoteDecoderManagerParent waiting to be closed will be reported as leaking. A RemoteDecoderManagerParent manage a taskqueue, and an array of images.

The GPU process is leaking RemoteDecoderManagerParent in exactly the same fashion, however it has a threshold set to 10kB and so the leaks aren't apparent. But they are definitely there.

We would need to have a mechanism similar to the MediaShutdownMonitor in the RDD and GPU process, so that even when the GPUParent/RDDParent receive an intstruction to shutdown their process, they won't act on it until all registered RemoteDecoderManagerParent actors have been closed.

The RDD process gets shutdown following a NS_XPCOM_SHUTDOWN_OBSERVER_ID notification.
Notifications are processed in LIFO order, since the RDD process is started on demand it would have typically be registered after a content process.
We must ensure that the RDD get shutdown after all content processes so that it can receive notifications that the RemoteDecoderManagerChilds are shutting down.

Depends on D90484

Depends on D90485

We unfortunately can't use the AsyncShutdownService in either the GPU or RDD process.

So we add a little utility class AsyncBlockers that will resolve its promise once all services have deregistered from it.

We use it to temporily suspend the RDDParent or GPUParent from killing the process, up to 10s.

This allows for cleaner shutdown as the parent process doesn't guarantee the order in which processes are killed (even though it should).

Depends on D90486

And fix thread-safety access to sRemoteDecoderManagerChildThread static while at it.

Depends on D90487

Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/68c5b2c56f3f
P1. Revert "Increase RDD leak threshold to 4000 for now. r=mccr8"
https://hg.mozilla.org/integration/autoland/rev/a38425c96709
P2. Ensure the RDD process gets shutdown after content processes. r=mjf
https://hg.mozilla.org/integration/autoland/rev/6943102ffe2a
P3. Use nsCOMPtr. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/d143ac59991f
P4. Wait until all MediaRemoteDecoderManagerParent have closed before killing process. r=mattwoodrow.

For some reasons, P5 didn't get push which fixed those assertions.

Flags: needinfo?(jyavenard)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ef7eca1879d9
P1. Revert "Increase RDD leak threshold to 4000 for now. r=mccr8"
https://hg.mozilla.org/integration/autoland/rev/a8124dd6945d
P2. Ensure the RDD process gets shutdown after content processes. r=mjf
https://hg.mozilla.org/integration/autoland/rev/22b8a193cae1
P3. Use nsCOMPtr. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/fe83737b9acf
P4. Wait until all MediaRemoteDecoderManagerParent have closed before killing process. r=mattwoodrow.
https://hg.mozilla.org/integration/autoland/rev/3695534b2938
P5. Ensure no task gets dispatched after shutdown. r=mattwoodrow
Status: NEW → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
No longer regressions: 1668034
See Also: → 1613128
See Also: → 1672350
See Also: → 1672510
You need to log in before you can comment on or make changes to this bug.