Open Bug 1499551 Opened 6 years ago Updated 3 years ago

Treeherder consuming much memory/RAM (and CPU due to GC/garbage collection?)

Categories

(Tree Management :: Treeherder: Frontend, defect, P1)

defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: aryx, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

Attachments

(2 files)

For the last hours, Treeherder seems to require much memory (mapped process IDs to urls on about:memory). Those content process grow to GB (RyanVM had one >8GB). Furthermore they often have increased CPU load (1 virtual core), from peeking at the stack likely garbage collection due to low free memory.

On Nightly I did some interactions with TH and after ~1/2h the system said it was out of memory (has happened several times during the last hours). For TH open in a minimalist Firefox 62.0.3 (no other tabs) without interaction, memory load also increases but not that quick like in the Nightly profile.
Flags: needinfo?(cdawson)
Any chance you can get the memory report from about:memory?
Posting this one for a content process here, if you need a full anonymized version, let me know.

1,475.34 MB (100.0%) -- explicit
├──1,369.35 MB (92.82%) -- window-objects
│  ├──1,364.97 MB (92.52%) -- top(https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=usercancel%2Cretry%2Crunnable%2Ctestfailed%2Cbusted%2Cexception&group_state=expanded&selectedJob=205688230&fromchange=94a62c1aad526dc24dc9186a6ccebb0db276ee87, id=15032385537)
│  │  ├──1,118.07 MB (75.78%) -- active/window(https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=usercancel%2Cretry%2Crunnable%2Ctestfailed%2Cbusted%2Cexception&group_state=expanded&selectedJob=205688230&fromchange=94a62c1aad526dc24dc9186a6ccebb0db276ee87)
│  │  │  ├──1,098.02 MB (74.42%) -- js-realm(https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=usercancel%2Cretry%2Crunnable%2Ctestfailed%2Cbusted%2Cexception&group_state=expanded&selectedJob=205688230&fromchange=94a62c1aad526dc24dc9186a6ccebb0db276ee87)
│  │  │  │  ├──1,093.85 MB (74.14%) -- classes
│  │  │  │  │  ├────476.77 MB (32.32%) -- class(Object)/objects
│  │  │  │  │  │    ├──241.09 MB (16.34%) ── gc-heap
│  │  │  │  │  │    └──235.69 MB (15.98%) -- malloc-heap
│  │  │  │  │  │       ├──235.65 MB (15.97%) ── slots
│  │  │  │  │  │       └────0.03 MB (00.00%) ── elements/normal
│  │  │  │  │  ├────264.64 MB (17.94%) -- class(Array)/objects
│  │  │  │  │  │    ├──244.43 MB (16.57%) ── gc-heap
│  │  │  │  │  │    └───20.22 MB (01.37%) -- malloc-heap
│  │  │  │  │  │        ├──20.21 MB (01.37%) ── elements/normal
│  │  │  │  │  │        └───0.00 MB (00.00%) ── slots
│  │  │  │  │  ├────217.62 MB (14.75%) -- class(Call)/objects
│  │  │  │  │  │    ├──217.60 MB (14.75%) ── gc-heap
│  │  │  │  │  │    └────0.03 MB (00.00%) ── malloc-heap/slots
│  │  │  │  │  ├────133.03 MB (09.02%) -- class(Function)/objects
│  │  │  │  │  │    ├──132.97 MB (09.01%) ── gc-heap
│  │  │  │  │  │    └────0.06 MB (00.00%) ── malloc-heap/slots
│  │  │  │  │  └──────1.78 MB (00.12%) ++ (15 tiny)
│  │  │  │  └──────4.17 MB (00.28%) ++ (7 tiny)
│  │  │  ├─────15.06 MB (01.02%) ++ layout
│  │  │  └──────4.99 MB (00.34%) ++ (3 tiny)
│  │  └────246.90 MB (16.74%) -- js-zone(0x891a800)
│  │       ├──144.29 MB (09.78%) -- strings
│  │       │  ├───99.41 MB (06.74%) -- string(<non-notable strings>)
│  │       │  │   ├──74.38 MB (05.04%) -- gc-heap
│  │       │  │   │  ├──74.28 MB (05.03%) ── latin1
│  │       │  │   │  └───0.11 MB (00.01%) ── two-byte
│  │       │  │   └──25.03 MB (01.70%) -- malloc-heap
│  │       │  │      ├──24.72 MB (01.68%) ── latin1
│  │       │  │      └───0.31 MB (00.02%) ── two-byte
│  │       │  └───44.88 MB (03.04%) ++ (531 tiny)
│  │       ├───79.01 MB (05.36%) -- shapes
│  │       │   ├──62.21 MB (04.22%) -- gc-heap
│  │       │   │  ├──61.57 MB (04.17%) ── dict
│  │       │   │  └───0.64 MB (00.04%) ++ (2 tiny)
│  │       │   └──16.80 MB (01.14%) -- malloc-heap
│  │       │      ├──15.52 MB (01.05%) ── dict-tables
│  │       │      └───1.29 MB (00.09%) ++ (2 tiny)
│  │       ├───19.35 MB (01.31%) ── gc-heap-arena-admin
│  │       └────4.25 MB (00.29%) ++ (14 tiny)
│  └──────4.37 MB (00.30%) ++ top(about:newtab, id=15032385551)
├─────60.00 MB (04.07%) -- js-non-window
│     ├──36.57 MB (02.48%) -- runtime
│     │  ├──18.44 MB (01.25%) ++ (14 tiny)
│     │  └──18.13 MB (01.23%) ++ gc
│     ├──19.33 MB (01.31%) -- gc-heap
│     │  ├──18.33 MB (01.24%) ── chunk-admin
│     │  └───1.00 MB (00.07%) ++ (2 tiny)
│     └───4.10 MB (00.28%) ++ (3 tiny)
├─────30.92 MB (02.10%) ++ (24 tiny)
└─────15.07 MB (01.02%) ── heap-unclassified
Assignee: nobody → cdawson
Status: NEW → ASSIGNED
Flags: needinfo?(cdawson)
This hit here all day and required frequent restarts, then noticed this difference:

Having a watched tree with pushes open (e.g. autoland) caused  the CPU load jump to often 1 virtual core for content processes.

Having 50 tabs with a single treeherder push in each doesn't cause the issue.

Closed the tabs with those trees and opened them again and the CPU load is now not always at 1 virtual core for some content processes, only sometimes. Memory increase still active.
I'm investigating this today.  Thanks for the additional information.  This is my top priority right now.
Component: Treeherder → Treeherder: Frontend
I have a fix that seems to help with this.  I don't know that it's a "smoking gun" pointing to the root cause.  But the growth is in line with what I'd expect with the window sitting there (it's always loading new jobs as they are created).
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/4d374ab6c01e69596869e6e688ad2a1430a06b5b
Bug 1499551 - Address memory leak when loading new pushes

This fix is not a "smoking gun" kind of fix.  But it does appear to remedy
the problem in local testing.  It is uses the spread syntax rather than
adding to the existing push, which I believe to be more correct.

It's my belief that the pushList.push was what was causing problems.
There was a request for feedback to test with https://treeherder.allizom.org/ and the situation is unchanged (test with Nightly).

Steps to reproduce:
1. Go to https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound
2. Load inbound, autoland and central trees.
3. Load 50 more pushes for each tree.

Memory footprint for the content process increases, after 30 (?) minutes it was like 2x 1.5GB + 1x 2GB, CPU utilization was 1 virtual core at the start and later 3 (1/content process).
With the fix for Bug 1501682, I believe this issue is remedied.  Please reopen if you find that not to be the case.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Depends on: 1501682
Priority: -- → P1
Bug 1501682 fixed the high CPU load, the high memory situation persists. E.g. there are several web content processes listed in Nightlies about:memory with numbers similar to those:

Explicit Allocations

1,447,002,112 B (100.0%) -- explicit
├──1,232,526,648 B (85.18%) -- window-objects
│  ├──1,135,619,264 B (78.48%) -- top(https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&group_state=expanded&fromchange=e8d261f997c96dfdcb8e8b2cecec5d21375b7672&searchStr=windows%2Cpgo, id=12884901898)
│  │  ├──1,035,325,896 B (71.55%) -- active/window(https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&group_state=expanded&fromchange=e8d261f997c96dfdcb8e8b2cecec5d21375b7672&searchStr=windows%2Cpgo)
│  │  │  ├──1,004,602,720 B (69.43%) -- js-realm(https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&group_state=expanded&fromchange=e8d261f997c96dfdcb8e8b2cecec5d21375b7672&searchStr=windows%2Cpgo)
│  │  │  │  ├──1,003,827,640 B (69.37%) -- classes
│  │  │  │  │  ├────365,941,544 B (25.29%) -- class(Array)/objects
│  │  │  │  │  │    ├──364,429,600 B (25.19%) ── gc-heap
│  │  │  │  │  │    └────1,511,944 B (00.10%) -- malloc-heap
│  │  │  │  │  │         ├──1,511,376 B (00.10%) ── elements/normal
│  │  │  │  │  │         └────────568 B (00.00%) ── slots
│  │  │  │  │  ├────361,979,360 B (25.02%) -- class(Call)/objects
│  │  │  │  │  │    ├──361,963,808 B (25.01%) ── gc-heap
│  │  │  │  │  │    └───────15,552 B (00.00%) ── malloc-heap/slots
│  │  │  │  │  ├────217,890,688 B (15.06%) -- class(Function)/objects
│  │  │  │  │  │    ├──217,851,328 B (15.06%) ── gc-heap
│  │  │  │  │  │    └───────39,360 B (00.00%) ── malloc-heap/slots
│  │  │  │  │  ├─────56,046,416 B (03.87%) -- class(Object)/objects
│  │  │  │  │  │     ├──30,134,528 B (02.08%) -- malloc-heap
│  │  │  │  │  │     │  ├──30,103,616 B (02.08%) ── slots
│  │  │  │  │  │     │  └──────30,912 B (00.00%) ── elements/normal
│  │  │  │  │  │     └──25,911,888 B (01.79%) ── gc-heap
...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 1067846
Flags: needinfo?(cdawson)
Working on this now and for the foresee-able future.  I think we have probably several pieces to this puzzle.
Flags: needinfo?(cdawson)
Is https://i.imgur.com/xL8Cb8h.jpg helpful to figure out which code is leaking memory?
Flags: needinfo?(cdawson)
(In reply to Florian Quèze [:florian] from comment #12)
> Is https://i.imgur.com/xL8Cb8h.jpg helpful to figure out which code is
> leaking memory?

Inverting the call stacks (https://i.imgur.com/uiEHOk2.jpg) shows that much of the increased memory usage comes from .bind() calls, which are triggered by enqueueSetState calls (with some minified code in the middle).
Likely the issue is more in the render() methods that do the binds. I don't know why the memory is retained though...
Without looking at the Treeherder code, or knowing what bind_bindFunction does, I have a hypothesis.

I've seen this anti-pattern crop up from time to time:

el.addEventListener("someEvent", someFunction.bind(this));

and then during some kind of teardown:

el.removeEventListener("someEvent", someFunction.bind(this));


Where this won't actually remove the event handler, since someFunction.bind(this) returns a new function reference each time.

I wonder if we're doing something like that somewhere, and keeping a bunch of handlers (and their scopes) alive?
I checked all usages of addEventListener/removeEventListener and they appeared to all be using functions that had been bound correctly. eg:
https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/details/tabs/AnnotationsTab.jsx#L140-L160

We're also planning on switching to class properties to do away for the need for bind() entirely in bug 1480166.

However I did find the following...

These components have an addEventListener() but no corresponding removeEventListener() in componentWillUnmount():
https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/context/Pushes.jsx#L108
https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/shared/auth/Login.jsx#L29-L45

This component calls addEventListener() in componentDidUpdate() rather than componentDidMount() - which seems like it would create duplicate event listeners?
https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/context/SelectedJob.jsx#L68

This component uses a different type name with addEventListener() and removeEventListener():
https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/headerbars/SecondaryNavBar.jsx#L47-L52

This component appears as though it is trying to remove the event listener but I don't see how it is meant to work?
https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/shared/context/Notifications.jsx#L34-L55
(In reply to Ed Morley [:emorley] from comment #16)

Good leads you found here :-)
A few comments:

> This component calls addEventListener() in componentDidUpdate() rather than
> componentDidMount() - which seems like it would create duplicate event
> listeners?
> https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/context/SelectedJob.jsx#L68

This one is fine because addEventListener doesn't add a new listener if you call it with the same function, which is the case here. But for sure it would be more correct to add it in componentDidMount instead :-)


While grepping, I also found these issues that shouldn't create a leak (they should be GCed properly) but could be performance issues, with binds happening in render:

https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/pushes/JobGroup.jsx#L138
https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/App.jsx#L247
I grepped on setInterval too. I found this possibly problematic piece: https://github.com/mozilla/treeherder/blob/c19e09be12847eee7ad4cea384a5b564875088ae/ui/job-view/App.jsx#L111

We call setInterval in `componentDidMount`, maybe it should be cleared in `componentWillUnmount` too, currently it's cleared only in a `if` branch somewhere. I don't know if App ends up being remounted a lot. (if that's the case maybe that's another issue)

Other locations with setInterval do it properly.
See Also: 1507386
Thanks for all the insight and info!  I can at least address what you folks have found.  I really appreciate the extra eyes on this.  :)
Flags: needinfo?(cdawson)
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/964d81f8ac0a79fc5fd2ac28d4462a969d405ccd
Bug 1499551 - Cleanup add/remove event listeners and binding (#4312)

And also clear server update interval on App unmount.
See Also: → 1510280

Is this issue still occurring?

I can't say for sure if it's the same issue, but today I had to kill a content process because treeherder was making it unresponsive. I could capture a profile before killing it: https://perfht.ml/2Dg28x9

Ah that's good to know - and thank you for the profile.

Cameron, did you have any more ideas for this bug? :-)

Another profile captured today: https://perfht.ml/2JyXpNu

Leaving treeherder open in a background tab still makes the content process unusable: https://perfht.ml/2DUhWHe

Depends on: 1331399
Depends on: 1524128
Depends on: 1541544

I believe this is now remedied. I haven't heard complaints about memory consumption in a while. And, while Treeherder requires quite a bit of memory, it does not appear to grow unbounded any longer.

Please reopen if you run into more issues in this regard.

Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → FIXED

I forgot a treeherder tab in the background again, and that made the content process unresponsive (and it used about 6GB of memory) Here's a profile of it, that made me identify the treeherder tab as the culprit: https://share.firefox.dev/2Tqpx7R

This looks pretty similar to the profile I had in comment 27; reopening.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I am in a sprint and can't tackle this any time soon till it is in a Sprint. Removing myself.

Assignee: cdawson → nobody
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: