Closed Bug 1638027 Opened 4 years ago Closed 3 years ago

Intermittent leakcheck | tab 248 bytes leaked (Mutex, PerformanceStorageWorker, WorkerRef)

Categories

(Core :: Performance, defect, P5)

defect

Tracking

()

RESOLVED INVALID
Tracking Status
firefox76 --- unaffected
firefox77 --- unaffected
firefox78 --- wontfix

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [retriggered][stockwell disabled][stockwell needswork:owner])

Attachments

(3 files)

Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=302287693&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Ionqj6UqQTKwxckYg2srYA/runs/0/artifacts/public/logs/live_backing.log


...
[task 2020-05-14T15:55:11.008Z] 15:55:11     INFO - TEST-PASS | leakcheck | tab no leaks detected!
[task 2020-05-14T15:55:11.008Z] 15:55:11     INFO - leakcheck | Processing leak log file /tmp/tmp88Z3M0.mozrunner/runtests_leaks_tab_pid18064.log
[task 2020-05-14T15:55:11.009Z] 15:55:11     INFO - 
[task 2020-05-14T15:55:11.010Z] 15:55:11     INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 18064
[task 2020-05-14T15:55:11.010Z] 15:55:11     INFO - 
[task 2020-05-14T15:55:11.011Z] 15:55:11     INFO -      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2020-05-14T15:55:11.011Z] 15:55:11     INFO -      |                                      | Per-Inst   Leaked|   Total      Rem|
[task 2020-05-14T15:55:11.012Z] 15:55:11     INFO -    0 |TOTAL                                 |       36      248|   55310        3|
[task 2020-05-14T15:55:11.013Z] 15:55:11     INFO -  256 |Mutex                                 |       72       72|     447        1|
[task 2020-05-14T15:55:11.013Z] 15:55:11     INFO -  388 |PerformanceStorageWorker              |       96       96|       2        1|
[task 2020-05-14T15:55:11.014Z] 15:55:11     INFO -  579 |WorkerRef                             |       80       80|      10        1|
[task 2020-05-14T15:55:11.016Z] 15:55:11     INFO - 
[task 2020-05-14T15:55:11.016Z] 15:55:11     INFO - nsTraceRefcnt::DumpStatistics: 943 entries
[task 2020-05-14T15:55:11.017Z] 15:55:11     INFO - TEST-INFO | leakcheck | tab leaked 1 Mutex
[task 2020-05-14T15:55:11.017Z] 15:55:11     INFO - TEST-INFO | leakcheck | tab leaked 1 PerformanceStorageWorker
[task 2020-05-14T15:55:11.018Z] 15:55:11     INFO - TEST-INFO | leakcheck | tab leaked 1 WorkerRef
[task 2020-05-14T15:55:11.018Z] 15:55:11     INFO - TEST-UNEXPECTED-FAIL | leakcheck | tab 248 bytes leaked (Mutex, PerformanceStorageWorker, WorkerRef)
[task 2020-05-14T15:55:11.018Z] 15:55:11     INFO - 
[task 2020-05-14T15:55:11.019Z] 15:55:11     INFO - leakcheck | Processing leak log file /tmp/tmp88Z3M0.mozrunner/runtests_leaks_tab_pid18037.log
...
Has Regression Range: --- → yes
Whiteboard: [retriggered] → [retriggered][stockwell needswork:owner]

Set release status flags based on info from the regressing bug 1634259

Assignee: nobody → apavel
Status: NEW → ASSIGNED
Attachment #9151155 - Attachment description: Bug 1638027 - Disabled browser_ext_windows_size.js, browser_readerMode_readingTime.js, browser_showMenu.js on linux18.04 debug r#?intermittent-reviewers → Bug 1638027 - Disabled browser_ext_windows_size.js, browser_readerMode_readingTime.js, browser_showMenu.js on linux18.04 debug r=jmaher DONTBUILD
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell disabled]
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/37ab439d2e63
Disabled browser_ext_windows_size.js, browser_readerMode_readingTime.js, browser_showMenu.js on linux18.04 debug r=jmaher DONTBUILD
Keywords: leave-open
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b25757fea03b
Disable browser_readerMode_pocket.js on linux18.04 debug r=jmaher DONTBUILD

There are 138 failures in the last 7 days.
The "new" ones are on "browser_readerMode_hidden_nodes.js" so I`ll add a disable patch for this one too.
Patricia with this one, there will be 5 tests disabled by this leakcheck.
Can you please take a look?

Flags: needinfo?(plawless)
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b05abc0736ee
Disable browser_readerMode_hidden_nodes.js on linux18.04 debug r=jmaher DONTBUILD

After the disable, the failure jumped on browser/components/extensions/test/browser/browser_ext_windows_remove.js

Andrew, can you help stopping those leaks, please?

Flags: needinfo?(continuation)

These failures all have Fission enabled, so it would be possible to disable them only for Fission. Reader mode tests have caused issues with Fission before so it isn't too surprising we're seeing more. I think the issue is that they end up creating and destroying a lot of workers. I'll try to take a look soon, but this is probably going to be a worker issue. But maybe I can narrow it down a bit.

Flags: needinfo?(continuation)
See Also: → 1616805

It looks like the closure for the weak worker reference for PerformanceStorageWorker has a strong reference to the worker. I assume there's some kind of shutdown race where we create the weak ref, but never notify the ref that the worker went away.

One approach might be to make the closure reference to the PerformanceStorageWorker weak. You'd clear the callback in the dtor, then. It would probably make some static analysis sad.

I believe the problem here is a continuation of the saga related to starting workers late enough in shutdown that by the time their primary runnable gets a chance to run PBackground is already shutdown in xpcom-shutdown-threads. (But not so late that RuntimeService::Shutdown has set RuntimeService::mShuttingDown to true, thereby causing worker construction to fail.)

Bug 1638170 is the most recent work in this area where we introduced checks that assert strong worker refs aren't handed out prior to the worker reaching the Running state, but we allowed weak refs to be obtained, but WorkerPrivate::RunLoopNeverRan() didn't make the explicit notify transition that would cause the weak refs to be dropped or otherwise approximate it (because Notify* assumes an operating loop). And notably in bug 1615014 I explicitly had us move the creation of PerformanceStorageWorker prior to attempting to create the PBackground actor because the creation of the actor results in activity on the main thread that would otherwise result in a race.

The immediate action here is to make RunLoopNeverRan() cause the WeakWorkerRef instances to be dropped. I'm filing a bug for that in another tab that will block this bug (but that can be refactored).

Flags: needinfo?(bugmail)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #39)

The immediate action here is to make RunLoopNeverRan() cause the WeakWorkerRef instances to be dropped. I'm filing a bug for that in another tab that will block this bug (but that can be refactored).

Meant to add: And the next fix after that is to help make sure we're not trying to start (chrome) workers during the shutdown process without a good reason. (And where "a good reason" means that a shutdown blocker is provided as proof-of-good-reason.)

Assignee: apavel → nobody
Status: ASSIGNED → NEW

This is in the DOM performance API, which wasn't written or as far as I know is actively being worked on by the performance team. Baku might have an idea here.

Flags: needinfo?(plawless) → needinfo?(amarchesini)

Andrew, it seems you have done the first round of debugging. Are you able to provide a fix for this leak?
Maybe we should not create a PerformanceStorageWorker when we are shutting down the browser.

Flags: needinfo?(amarchesini) → needinfo?(bugmail)

These tests are not failing or disabled only for Fission so removing the fission dependency.

No longer blocks: fission-mochitests

The failure rate has massively spiked up in the last week. Given the failures, I'm guessing that this is due to the enabling of browser chrome tests with Fission enabled. Every failure in the 3 weeks has been with Fission enabled.

Yaron will look into this.

Flags: needinfo?(bugmail) → needinfo?(ytausky)

There are 63 total failures in the last 7 days on

  • macosx1015-64-qr debug
  • linux1804-64 debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=343384323&repo=autoland&lineNumber=93005

[task 2021-06-21T15:15:52.854Z] 15:15:52 INFO - nsTraceRefcnt::DumpStatistics: 2081 entries
[task 2021-06-21T15:15:52.854Z] 15:15:52 INFO - TEST-PASS | leakcheck | default no leaks detected!
[task 2021-06-21T15:15:52.854Z] 15:15:52 INFO - leakcheck | Processing leak log file /var/folders/c2/gp9_mxbj7px3dsk7fl0bpv5r000014/T/tmpqlp51xqh.mozrunner/runtests_leaks_tab_pid3440.log
[task 2021-06-21T15:15:52.854Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.855Z] 15:15:52 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 3440
[task 2021-06-21T15:15:52.855Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.855Z] 15:15:52 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2021-06-21T15:15:52.855Z] 15:15:52 INFO - | | Per-Inst Leaked| Total Rem|
[task 2021-06-21T15:15:52.856Z] 15:15:52 INFO - 0 |TOTAL | 57 0| 88568 0|
[task 2021-06-21T15:15:52.857Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.857Z] 15:15:52 INFO - nsTraceRefcnt::DumpStatistics: 985 entries
[task 2021-06-21T15:15:52.857Z] 15:15:52 INFO - TEST-PASS | leakcheck | tab no leaks detected!
[task 2021-06-21T15:15:52.858Z] 15:15:52 INFO - leakcheck | Processing leak log file /var/folders/c2/gp9_mxbj7px3dsk7fl0bpv5r000014/T/tmpqlp51xqh.mozrunner/runtests_leaks_tab_pid3455.log
[task 2021-06-21T15:15:52.858Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.859Z] 15:15:52 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 3455
[task 2021-06-21T15:15:52.859Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.859Z] 15:15:52 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2021-06-21T15:15:52.860Z] 15:15:52 INFO - | | Per-Inst Leaked| Total Rem|
[task 2021-06-21T15:15:52.860Z] 15:15:52 INFO - 0 |TOTAL | 65 312| 59184 3|
[task 2021-06-21T15:15:52.860Z] 15:15:52 INFO - 253 |Mutex | 96 96| 668 1|
[task 2021-06-21T15:15:52.861Z] 15:15:52 INFO - 373 |PerformanceStorageWorker | 120 120| 2 1|
[task 2021-06-21T15:15:52.861Z] 15:15:52 INFO - 564 |WorkerRef | 96 96| 9 1|
[task 2021-06-21T15:15:52.861Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.862Z] 15:15:52 INFO - nsTraceRefcnt::DumpStatistics: 943 entries
[task 2021-06-21T15:15:52.862Z] 15:15:52 INFO - TEST-INFO | leakcheck | tab leaked 1 Mutex
[task 2021-06-21T15:15:52.862Z] 15:15:52 INFO - TEST-INFO | leakcheck | tab leaked 1 PerformanceStorageWorker
[task 2021-06-21T15:15:52.863Z] 15:15:52 INFO - TEST-INFO | leakcheck | tab leaked 1 WorkerRef
[task 2021-06-21T15:15:52.863Z] 15:15:52 INFO - TEST-UNEXPECTED-FAIL | leakcheck | tab 312 bytes leaked (Mutex, PerformanceStorageWorker, WorkerRef)
[task 2021-06-21T15:15:52.863Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.864Z] 15:15:52 INFO - leakcheck | Processing leak log file /var/folders/c2/gp9_mxbj7px3dsk7fl0bpv5r000014/T/tmpqlp51xqh.mozrunner/runtests_leaks_tab_pid3441.log
[task 2021-06-21T15:15:52.864Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.864Z] 15:15:52 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 3441
[task 2021-06-21T15:15:52.865Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.865Z] 15:15:52 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2021-06-21T15:15:52.865Z] 15:15:52 INFO - | | Per-Inst Leaked| Total Rem|
[task 2021-06-21T15:15:52.866Z] 15:15:52 INFO - 0 |TOTAL | 53 0| 21544 0|
[task 2021-06-21T15:15:52.866Z] 15:15:52 INFO -
[task 2021-06-21T15:15:52.866Z] 15:15:52 INFO - nsTraceRefcnt::DumpStatistics: 770 entries
[task 2021-06-21T15:15:52.867Z] 15:15:52 INFO - TEST-PASS | leakcheck | tab no leaks detected!

Whiteboard: [retriggered][stockwell disabled] → [retriggered][stockwell disabled][stockwell needswork:owner]
Flags: needinfo?(ytausky) → needinfo?(jstutte)

This is not happening since October.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jstutte)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: