Closed Bug 894993 Opened 6 years ago Closed 6 years ago

[Session Restore] SessionWorker takes lots of memory.

Categories

(Core :: JavaScript Engine, defect)

25 Branch
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla25
blocking-b2g -
Tracking Status
firefox24 --- unaffected
firefox25 + verified
b2g18 --- affected
b2g18-v1.0.0 --- affected
b2g18-v1.0.1 --- affected
b2g-v1.1hd --- affected

People

(Reporter: Yoric, Assigned: bent.mozilla)

References

(Blocks 1 open bug)

Details

(Keywords: memory-footprint, perf, regression, Whiteboard: [MemShrink:P1] [c= s=2013.07.26])

Attachments

(1 file)

No description provided.
From the time I posted to now, the memory went from 83MB to 96MB.
This session began when I updated to this nightly, which I think happened 3h or 4h ago.
I have 4 app tabs (about:memory, Gmail - but I'm not logged in all the time, mozillazine and Facebook). Besides that, I have 1 or 2 tabs opened, normal browsing.
Note to readers: This is a spinoff from https://bugzilla.mozilla.org/show_bug.cgi?id=891360#c12 .

If I understand correctly Guilherme's observations, some of the memory holding strings is never reclaimed until he hits 'Minimize memory usage'.
│    ├──110.64 MB (10.67%) -- worker(resource:///modules/sessionstore/SessionWorker.js, 0x7ff71d761800)
│    │  ├──106.94 MB (10.31%) -- zone(0x7ff715c72000)
│    │  │  ├───72.04 MB (06.95%) ++ string-chars/huge
│    │  │  ├───34.75 MB (03.35%) -- compartment(web-worker)
│    │  │  │   ├──34.28 MB (03.31%) -- objects-extra
│    │  │  │   │  ├──34.25 MB (03.30%) ── elements/non-asm.js
│    │  │  │   │  └───0.03 MB (00.00%) ── slots
│    │  │  │   └───0.47 MB (00.05%) ++ (6 tiny)
│    │  │  └────0.15 MB (00.01%) ++ (3 tiny)
│    │  └────3.70 MB (00.36%) ++ (4 tiny)

That's from my about:memory page. My sessionstore.js is about 1.8mb.

Note that the PageThumbsWorker isn't really great at memory usage either:

│    ├───64.59 MB (06.23%) -- worker(resource://gre/modules/PageThumbsWorker.js, 0x7ff6e66f7800)
│    │   ├──61.46 MB (05.93%) -- zone(0x7ff6e6a15800)
│    │   │  ├──60.13 MB (05.80%) -- compartment(web-worker)
│    │   │  │  ├──58.01 MB (05.59%) -- objects-extra
│    │   │  │  │  ├──57.55 MB (05.55%) ── elements/non-asm.js
│    │   │  │  │  └───0.47 MB (00.04%) ++ (4 tiny)
│    │   │  │  └───2.11 MB (00.20%) ++ (6 tiny)
│    │   │  └───1.33 MB (00.13%) ++ (4 tiny)
│    │   └───3.13 MB (00.30%) ++ (4 tiny)
After hitting "Minimize memory usage" three times, both of these workers now moved to "Other measurements":

32.29 MB (100.0%) -- decommitted
├──26.78 MB (82.93%) ── js-non-window/gc-heap/decommitted-arenas
└───5.51 MB (17.07%) -- workers/workers()
    ├──2.17 MB (06.73%) ── worker(resource:///modules/sessionstore/SessionWorker.js, 0x7ff71d761800)/gc-heap/decommitted-arenas
    ├──2.16 MB (06.69%) ── worker(resource://gre/modules/PageThumbsWorker.js, 0x7ff6e66f7800)/gc-heap/decommitted-arenas
... And just a couple of minutes later the SessionWorker is almost as heavy as before, holding multiple session states as strings:

84.56 MB (08.57%) -- worker(resource:///modules/sessionstore/SessionWorker.js, 0x7ff71d761800)
Whiteboard: regr
[Sorry for the previous bugspam; pressed return by accident.]

Nominating for tracking-firefox25 because this is a regression from bug 891360.
Whiteboard: regr → [MemShrink]
Version: Trunk → 25 Branch
Assignee: nobody → ttaubert
I don't think I'm the right assignee for this bug. I have no idea about string GC in workers and why we're taking up so much memory here. We'll need help from someone a little more familiar with this topic.
Maybe khuey knows something about this, or knows whom to ask?
Yeah we're terrible at GCing on workers.
Assignee: ttaubert → nobody
Component: Session Restore → DOM: Workers
Product: Firefox → Core
Ben, khuey threw you under the bus - is this something you can help fix in the 25 cycle? Alternatively, can you suggest an alternate assignee?

A bunch of Session Store improvements we're working on depend on being able to move this I/O to a worker, and since session objects (or their serialization) can be rather large, we need to be GCing stuff properly.
Flags: needinfo?(bent.mozilla)
I think khuey is applying his usual editorial flourish a little too thickly here...

We have four mechanisms in place to GC a worker currently:

1 and 2: Workers GC on a timer while running (every 30 seconds) and then on another timer when idle (after 5 seconds, and this one actually gives memory back to the OS). If a worker is constantly running for short periods of time and then going idle again it's possible it never GCs from the timers.

3 and 4: Workers also GC when the JS engine "thinks" it should based on the normal memory preferences that we have. We check this immediately when the event queues are empty and then the JS engine also does this from time to time internally while running. We could tweak those preferences or start using the worker-specific overrides too.

This is obviously not bulletproof. One of the most obvious problems for ChromeWorkers is if the worker is constantly blocked on IO then the GC will be delayed or never run. I have some plans to fix that but currently it's not on my work list.

So what is the SessionWorker code doing? Is it somehow missing all four of these mechanisms?

Maybe there's a bug in one of them, or maybe it's as simple as tweaking the prefs. Maybe we need to check the memory allocated at every event.
Flags: needinfo?(bent.mozilla)
The SessionWorker code is only doing trivial stuff with the strings:
* receive string;
* run a few tests on the string;
* optionally, some logging using dump();
* convert the string to an ArrayBuffer using TextEncoder.

The string is never stored anywhere.

Also, that ChromeWorker is idle most of the time (i.e. except on startup, it only ever receives a message every ~15s and stays awake for only a few milliseconds), so there are good chances that we don't benefit from 1. and 2.

Note that this ChromeWorker calls js-ctypes at every call (the strings that annoy us are not passed as arguments), so this might play in somewhere.
Ok, I poked around a little, and it looks like the idle timer is working as expected (though I forgot that the idle timer still just calls JS_MaybeGC, it doesn't force a shrinking GC). The JS engine just doesn't think it's worth GCing...
nbp, do you think you could look at this?  This is a big problem at the moment, because it's blocking the session restore rewrite feature.
Whiteboard: [MemShrink] → [MemShrink:P1]
Assignee: nobody → nicolas.b.pierron
(In reply to Nicholas Nethercote [:njn] from comment #14)
> nbp, do you think you could look at this?  This is a big problem at the
> moment, because it's blocking the session restore rewrite feature.

Not within 1 or 2 weeks.
Flags: needinfo?(n.nethercote)
gwagner and I looked a little more here and it definitely seems like a JS engine problem... The runtime's gcMallocBytes is a large negative number (0xff6b8004) and gcMaxMallocBytes is 0x08000000. We should have GC'd.
Ok, this is definitely a JS engine bug. Here's what's going wrong:

  JSRuntime::updateMallocCounter(JS::Zone * zone, unsigned int nbytes)
  JSRuntime::updateMallocCounter(unsigned int nbytes)
  js::MallocProvider<JSRuntime>::malloc_(unsigned int bytes)
  js::MallocProvider<JSRuntime>::new_<JS::Zone,JSRuntime *>(JSRuntime * p1)
  JSRuntime::init(unsigned int maxbytes)
  JS_NewRuntime(unsigned int maxbytes, JSUseHelperThreads useHelperThreads)
  `anonymous namespace'::CreateJSContextForWorker()

The JSRuntime constructor sets gcMallocBytes to 0. Then, JSRuntime::init allocates a Zone and subtracts its size from 0, leaving a negative number. From that point on the malloc trigger is basically dead.

I think this only happens on workers because we don't force a full GC at any point during normal operation, we just use JS_MaybeGC.
Component: DOM: Workers → JavaScript Engine
Flags: needinfo?(n.nethercote)
Attached patch changes.patchSplinter Review
Ok, this fixes the JS engine and then switches the worker idle timer back to a forced shrinking GC.
Assignee: nicolas.b.pierron → bent.mozilla
Status: NEW → ASSIGNED
Attachment #780655 - Flags: review?(wmccloskey)
Attachment #780655 - Flags: review?(khuey)
(In reply to ben turner [:bent] from comment #18)
> switches the worker idle timer back to a forced shrinking GC.

... because JS_MaybeGC is just not tuned right for workers. It won't try to GC based on time if there has never been a GC before, and it won't try to GC based on allocation until we've allocated 128mb (!!!). We need to fix JS_MaybeGC to be a bit smarter, and I'll talk with gwagner about this next week. The worker change here should be reverted once we fix it.
Comment on attachment 780655 [details] [diff] [review]
changes.patch

Oops, sorry about this.
Attachment #780655 - Flags: review?(wmccloskey) → review+
> ... because JS_MaybeGC is just not tuned right for workers. It won't try to GC based on time
> if there has never been a GC before, and it won't try to GC based on allocation until we've
> allocated 128mb (!!!). We need to fix JS_MaybeGC to be a bit smarter, and I'll talk with
> gwagner about this next week. The worker change here should be reverted once we fix it.

Where does 128mb come from? I've always had trouble understanding how the GC prefs are set in workers, but that seems very high for the initial trigger bytes, even in the browser.
(In reply to Bill McCloskey (:billm) from comment #21)
> Where does 128mb come from?

http://mxr.mozilla.org/mozilla-central/source/modules/libpref/src/init/all.js#828

We can override it for workers but guessing the right number could be tough.
https://hg.mozilla.org/mozilla-central/rev/f1e8ada1ddf7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Do we need this on b2g18?
The zone stuff is on trunk only I think (right billm?)

We should make the same worker changes (remove the one JS_MaybeGC) on b2g18 too.
Flags: needinfo?(wmccloskey)
Yes, zones is not on b2g18.
Flags: needinfo?(wmccloskey)
blocking-b2g: --- → leo?
Would it be hard to write a sketch of the patch for b2g18 while this is still leo? ?  That might make it easier to argue this case to release drivers.  Although maybe that doesn't really matter, since Leo is cherry-picking off b2g18 anyway.
Keywords: perf
Whiteboard: [MemShrink:P1] → [MemShrink:P1] [c= s=2013.07.26]
Ben, this does not seem like a blocker and we are not taking any new perf improvements at this point, unless we have already committed. Also this is 1.0.1 affected so not sure why we would need this on 1.1.
blocking-b2g: leo? → -
> we are not taking any new perf improvements at this point, unless we have already committed

We've been taking a lot of perf improvements at leo's behest.

It would be frustrating if they hit this bug in their last round of testing and then we had to scramble to fix it.  Perhaps you can coordinate with them to find out for sure that is not desired.
(In reply to Justin Lebar [:jlebar] from comment #29)
> Would it be hard to write a sketch of the patch for b2g18

It's identical to the m-c patch, RuntimeService.cpp changes only.
(In reply to ben turner [:bent] from comment #32)
> (In reply to Justin Lebar [:jlebar] from comment #29)
> > Would it be hard to write a sketch of the patch for b2g18
> 
> It's identical to the m-c patch, RuntimeService.cpp changes only.

Don't we want the resetGCMallocBytes change as well? That's broken on b2g18 as far as I can tell.
Hm, I guess so. Zones trigger the bug but don't cause it.
Guilherme, are you still experiencing this issue in Firefox 25?
Flags: needinfo?(guijoselito)
I'm a Nightly user, and don't have a Firefox 25 (but I can download it if you need).
But I remember that the first nightly with this patch made it work like it does now: the memory comsuption is always
4.27 MB (11.16%) -- workers/workers()
    ├──1.43 MB (03.72%) ── worker(resource://gre/modules/PageThumbsWorker.js, 0x64da800)/gc-heap/decommitted-arenas
    ├──1.43 MB (03.72%) ── worker(resource://gre/modules/osfile/osfile_async_worker.js, 0x6826000)/gc-heap/decommitted-arenas
    └──1.42 MB (03.71%) ── worker(resource:///modules/sessionstore/SessionWorker.js, 0x7252400)/gc-heap/decommitted-arenas

100% of the time, the memory is 4MB.
Flags: needinfo?(guijoselito)
I think that's enough to call this verified. Thank you Guilherme.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.