Closed Bug 1059139 Opened 7 years ago Closed 2 years ago

[Meta] Prototyping a Memory Profiler

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: laszio.bugzilla, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

(Keywords: meta)

Attachments

(2 files, 1 obsolete file)

15.76 KB, application/x-tgz
Details
4.02 KB, application/javascript
Details
Attached file poc-mem-profiler.tgz (obsolete) —
This bug is opened to help discussion among people in different timezone.

Attached is a proof-of-concept implementation which samples GCHeap by growth (i.e. every N bytes) to approximate retaining size, peak retaining size and accumulated size of each function over a period of time. It does not work with generational gc currently and must be called via debugger (i.e. gdb). It is somewhat ad-hoc to get the stack trace; It looks for the js stack of the running thread first. If that is not found, the main thread of the same JSRuntime is tried. It will be replaced by a new implementation utilizing Debugger.Memory eventually.

Known TODOs are:
1. To count non-gc-heap objects in: bug 1056992, bug 1057057, bug 1057093.
2. Bug 1056373: Debugger.Memory allocation log and related functions should support sampling, not just full logging
3. What will the {user, application} interfaces be? How can the information be presented to be useful?
4. A more efficient way to get the stack trace: bug 1028418 and deferring symbolication.

Please feel free to raise questions and ideas. Thanks!
To enable early access to users and to explore the ways of using the APIs, I setup a cookbook as requested by Thinker to collect examples of memory profiling on Gecko. Currently there are only two examples. One directly copied from [1] and the other is a slight modification of the first.

https://github.com/ting-yuan/MemoryProfilingCookbook

[1] https://developer.mozilla.org/en-US/docs/Tools/Debugger-API/Tutorial-Allocation-Log-Tree
When reading the documents of Debugger.Memory, I realized that the "allocation log" is quite close to what a profiler needs. (Before reading the docs, I mistakenly regarded it as some kind of text logs which requires a parser.) However, it lacks the allocation sizes of the corresponding events so the bytes dimension is impossible in the profiler.

It appears to me that it's possible to include the size information without significant overheads. If you agree that this is the right thing to do, I'm glad to implement it. May I know your opinion?
Flags: needinfo?(jimb)
(In reply to Jim Blandy :jimb from comment #14)
> We could get accurate free counts by disabling generational GC and
> off-thread sweeping, or by adding instrumentation as I suggest in comment 9.
> 
> But we could also get accurate usage counts with a census, and that would
> give us retained sizes, which are much more meaningful.

(I'm replying here for not distracting the purpose of bug 105373.)

Yes, I agree that it's non-trivial (and perhaps ugly and incompatible to current designs) to keep track of de-allocation events, so I'd like to justify if we really need that and at what prices we would be paying.

In some situations, e.g. on platforms with tight memory budget, the peak is as meaningful as the retained size. A high memory peak is a source of memory pressure and likely a cause of performance problems (swaps, forced full GCs, etc). On Firefox OS and Android, it is an important index (at least to phone makers and perhaps end users) that how many apps can run in background simultaneously without being killed by low-mem-killer. This largely depends on the peak usage of memory.

I also received some positive feedbacks from gaia developers on the ability to observe memory peaks.

The peak directly reflects the memory requirement. There are some approximations but I'm not sure if they are good enough or not. A function which allocates more (which can be observed by allocation logs) is likely to have a higher peak but that is not always the case. Plotting the allocation events along the timeline gives some intuitions but still not perfect.

Besides the implementations and their implications explained by Jim, another approach comes into my mind is to keep a table of addresses of sampled-and-retained objects and garbage collect it as well. In my experiments, a naive implementation seems to slow things down by 30% - 50% in memory intense benchmarks. I believe it can be substantially improved. Ideally, the overhead is proportional to the probability that an object is sampled.

Even if those solutions are not free, we only pay when the profiler is enabled.

So, would you please reconsider de-allocation events?
(In reply to Ting-Yuan Huang from comment #2)
> When reading the documents of Debugger.Memory, I realized that the
> "allocation log" is quite close to what a profiler needs. (Before reading
> the docs, I mistakenly regarded it as some kind of text logs which requires
> a parser.) However, it lacks the allocation sizes of the corresponding
> events so the bytes dimension is impossible in the profiler.
> 
> It appears to me that it's possible to include the size information without
> significant overheads. If you agree that this is the right thing to do, I'm
> glad to implement it. May I know your opinion?

It would be fine to add more information to the log. But getting useful data this way might be harder than it seems.

JSObjects are usually all the same size when they're allocated. They grow later, as properties are added. We only have the hooks to collect data at allocation time, not when properties are added. So at present we can't find the object's real size.

Secondly, the "storage" consumed by a JSObject, as far as the JS developer is concerned, might well be in the things it points to: strings, for example. We don't have allocation logs for strings yet.
Flags: needinfo?(jimb)
Note that I've filed bugs for some of these things:

Sizes (the notable exception to what jimb says is Typed Arrays, which are allocated up front): https://bugzilla.mozilla.org/show_bug.cgi?id=1068988

Strings (and others): https://bugzilla.mozilla.org/show_bug.cgi?id=1068990
(In reply to Ting-Yuan Huang from comment #3)
> In some situations, e.g. on platforms with tight memory budget, the peak is
> as meaningful as the retained size. A high memory peak is a source of memory
> pressure and likely a cause of performance problems (swaps, forced full GCs,
> etc). On Firefox OS and Android, it is an important index (at least to phone
> makers and perhaps end users) that how many apps can run in background
> simultaneously without being killed by low-mem-killer. This largely depends
> on the peak usage of memory.
> 
> I also received some positive feedbacks from gaia developers on the ability
> to observe memory peaks.

Doesn't the nsIMemoryReporterManager sizeOfTab method already give you better peak memory usage data than a JSObject allocation log possibly could?

https://hg.mozilla.org/mozilla-central/file/4f2cac8d72da/xpcom/base/nsIMemoryReporter.idl#l434

Here's how it's used in the current devtools sources:

https://hg.mozilla.org/mozilla-central/file/4f2cac8d72da/toolkit/devtools/server/actors/memory.js#l121

That gives you sizes which include all the allocation overhead: partially-filled GC arenas, strings, everything. Wouldn't this be a much more accurate indicator of peak memory consumption than something derived from looking at JSObjects alone?

The worst overhead we've seen for a call to sizeOfTab is 20-30ms.
I would imagine that a display that correlates allocation sites with sizeOfTab results would be extremely helpful in assessing peak usage. sizeOfTab will yield much more real-world data than any technique based on tracking allocations and deallocations and trying to estimate how much memory became freed.
Victor Porof will soon land a timeline presenting data from sizeOfTab, for bug 1069421.
(In reply to Jim Blandy :jimb from comment #6)
> Doesn't the nsIMemoryReporterManager sizeOfTab method already give you
> better peak memory usage data than a JSObject allocation log possibly could?

Hmm, this is much too expensive to call on every function entry and exit. So it doesn't help assign blame to particular functions.
Here's another complication:

JSObjects, JSStrings, and JSScripts are all allocated in "arenas", fixed-size blocks of memory (like a page) containing exclusively one kind of thing. The things have a fixed size, so the arena is like an array of thing-sized spaces, which might be in use, or free. Each arena tracks its own free and used areas.

For each kind of thing, we keep a linked list of all the arenas holding that kind of thing that have any available spaces. To allocate a thing, we take the first arena that has a free space, find that arena's first free space, and construct the thing there.

So allocating a thing is pretty cheap, in the common case where there's an arena that has more than one free space. But if we use up the last space in an arena, we have to take it off the list of arenas with free space; and if we have no arenas on that list, then we have to allocate a fresh one.

One consequence of this design is that, until an arena is completely empty, we can't return its storage to the system. So, one long-lived object can keep a whole arena alive, even if the rest of that arena is entirely unused.

Now, if we want to measure peak memory usage, we certainly want to include unused arena space in our measurements. But to which function do we charge the allocation of a new arena?

Perhaps function f asked for a new JSObject, and that forced us to allocate a new arena. So we should blame f.

But suppose another function g then allocates another JSObject from that arena, f's JSObject is deallocated, and a reference to g's JSObject is stored someplace for a long time. Now it's really g's JSObject that is responsible for holding that arena alive; f's is long gone. It makes more sense to blame g.

But the fact that f and g both allocated JSObjects in the same arena is coincidence. Depending on the timing of GC, they could easily have ended up placing objects in different arenas.

Our logs could certainly report whether the object was the first in a new arena, or the last in an old arena.
(In reply to Jim Blandy :jimb from comment #4)
> JSObjects are usually all the same size when they're allocated. They grow
> later, as properties are added. We only have the hooks to collect data at
> allocation time, not when properties are added. So at present we can't find
> the object's real size.

It seems to me that a careful analysis is needed to track the memory usages in an object basis. If a set of dynamic slots is to be allocated, we either update the corresponding allocation log or simply attribute it to the current call stack (i.e. a new log). The later case should be easier although the semantic would be subtly different.
(In reply to Jim Blandy :jimb from comment #10)
> Here's another complication:

In terms of memory-reporter, this is unused gc-things, right? As you said, this problem is difficult. However, similiar things happen to traditional malloc/free: In morden OSes, physical memory are managed in pages so that a tiny allocation may occupy a whole page, which is 4k or 8k on most machines. I agree that it's a little bit contradictive if we reflect the delayed free nature of a mark-sweep collector but not this. This problem is implementation specific. Although a mark-sweep collector is also an implementation detail, it's very common and should be (well, I think) more acceptable to JS developers.
Sorry for replying so late. I spent a couple of days tracing the code and thinking about the difficulties. Would you mind if I work directly on this? That is, would you be available to discuss and review my patches to Debugger.Memory?
Similar to the latter, (maybe easier?) but not quite the same: we could expose new Shape allocations. The catch would be that you only get notified of the extended properties the first time they are added. If that's a deal breaker, than forget about it.
Attached file memory_profiler.tgz
Update the patches. Parent changeset is 209225:e4cfacb7683
Attachment #8479691 - Attachment is obsolete: true
Attached file Example.js
Attaching Example.js that demonstrates how to use the API exposed by the patches in browser/web console.

mpstart() starts profiling
mpstop() stops profiling

GetProfileResults() gets 3 tables in an object: {names: ["foo()", ... /*function name table*/], traces: [{parentIdx: int, nameIdx: int}, ... /* stack frames */], allocated: {size: int, timestamp: int, traceIdx: int}, ... /* allocation/free events */}

example1(result) shows top 20 functions that retains most
example2(result) shows top 20 functions allocated most
example3(result) shows top 20 functions with highest peak.
(In reply to Ting-Yuan Huang from comment #15)
> Created attachment 8535349 [details]
> memory_profiler.tgz
> 
> Update the patches. Parent changeset is 209225:e4cfacb7683

Oops, forgot to mention that, this relies on part of SPS. Please export B2G_PROFILING=1 in your .userconfig if you are building Firefox OS.

Desktop Firefox should require no additional treatment.
See Also: → 1123237
Depends on: 1123237
Depends on: 1474383

I'm looking for a meta bug to attach some of my memory work. Since this was about prototyping a memory profiling feature, but is not the current path we're exploring I'm going to close it in favor of other bugs.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.