Closed
Bug 842800
Opened 11 years ago
Closed 3 years ago
Track peak memory usage
Categories
(Core :: General, defect)
Core
General
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
(Whiteboard: [MemShrink:P2])
Attachments
(1 file, 1 obsolete file)
about:memory is great, but it gives us zero insight into peak memory consumption. Measuring peak consumption isn't easy to do in a fine-grained fashion, but it should be possible to get it for the C heap and JS heap without too much trouble. That should be enough to give us a sense of whether peak memory consumption is significantly higher than the steady-state memory consumption reported by about:memory.
Assignee | ||
Comment 1•11 years ago
|
||
Another idea is to periodically measure RSS, to get its peak.
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P2]
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → n.nethercote
Comment 2•11 years ago
|
||
Something we probably want to do in this bug or in a follow-up is to give users a way to reset the high water mark. That way you can more easily debug things like "what's the high water mark when I do X"?
Assignee | ||
Comment 3•11 years ago
|
||
> Something we probably want to do in this bug or in a follow-up is to give
> users a way to reset the high water mark.
The simplest implementation of that would be to reset it every time it's requested. Might be too inflexible, though. We'd have to see.
Assignee | ||
Comment 4•11 years ago
|
||
I'm going to change the direction of this bug slightly and make it a bit more ambitious. The hard part about measuring peak memory consumption is knowing when you're at the peak. My goal here is to have a "peak-finding" mode, probably triggered in about:memory, which runs slightly slower than normal. Ideally it would work in vanilla builds. Here's the basic approach. - Have (a) a cheap way to measure "total" memory consumption, and (b) an expensive, detailed measurement of memory. - Do the cheap total measurement frequently. - How frequently determines likelihood of missing a peak, and affects the slow-down. - When the cheap total measurement hits a new peak, if the peak is X% higher than the previous peak total, do an expensive measurement. - The value of X determines how how close we get to the peak, and affects the slow-down. For our implementation, the obvious candidate for the expensive, detailed measurement is a memory report dump to $TMPDIR. And X% can be user-selected via about:memory; 5% might be a reasonable default. So that's easy. The hard part is choosing the cheap measurement, and its frequency of measurement. Some possibilities for the measurement... - |nsMemoryReporterManager.explicit|. It's moderately expensive. We certainly couldn't see every change to it. - jemalloc's "heap-allocated" measurement. The current implementation is moderately expensive because it measures lots of other things at the same time, but we could hack mozjemalloc to improve that. We might be able to see every change to it, but that would ignore non-heap allocations. - RSS is the most important measurement for mobile OOMs(?). So measure it via |nsMemoryReporterManager.resident|. - On Linux/Android this opens and reads /proc/self/statm (a single-line file). - On Windows it calls GetProcessMemoryInfo(). - On Mac it calls GetTaskBasicInfo(), and if you want it to be accurate, also purges jemalloc's freed pages... ugh. Again, we couldn't see every change to it. None of these are all that cheap. (E.g. the cheapest possible measurement would be simply reading a global counter.) So we couldn't check them very frequently. Some possibilities for the peak checking frequency... - On a timer -- e.g. every second, half-second, quarter-second, whatever. (User-controllable.) - Every Nth event in the event loop. (Also user-controllable.) - In response to "check-peak" events, which could be triggered by our allocators (jemalloc, JS heap) every time a large chunk is allocated? - Some combination of the above. It's unclear how short-lived our peaks are, and therefore how frequently we need to measure to catch them. Hopefully not too short-lived.
Summary: Track peak memory usage of the C and JS heaps → Track peak memory usage
Assignee | ||
Comment 5•11 years ago
|
||
To start with, I'll try measuring RSS on Linux a few times a second, and see how that works.
Comment 6•11 years ago
|
||
A few comments/questions: - I'm a bit wary of time-based peak sampling; you can (and we are in these asm.js demos) allocate a ton of memory in a short period of time. What about hooking mmap? It seems like mmap calls would be rather more infrequent than malloc. - In addition to dumping about:memory to $TMPDIR, could we also get the stack trace of the malloc/mmap call that hit the peak? The two together would, I think, be a pretty useful combo. - It'd be great if we could see more than just the max peak. E.g., right now I'm pretty sure the peak for the UE3 demos is the parse nodes, but there are 3 or more shorter peaks after that that I'd like to understand as well. Perhaps a large free() call could reset the peak measurement?
Comment 7•11 years ago
|
||
What we did on Windows with AvailableMemoryTracker may be instructive. There, we wrapped VirtualAlloc (the equivalent of mmap), and every time a large VirtualAlloc came through, we'd read rss, which is fast compared to running VirtualAlloc. On Windows, you have to explicitly commit your memory when you call VirtualAlloc (although I don't know if memory committed this way immediately shows up in rss). This is different from *nix, where if you mmap 100mb, the OS doesn't reserve any space in memory or the swap file until you touch it a page. The fact that we knew how much memory was being committed when VirtualAlloc was called made this scheme somewhat reasonable. I can imagine doing something similar on *nix. We'd wrap mmap and then maybe set up write barriers on the resultant pages, so we know when they are committed? But this seems like a lot of work, when it's not clear to me that simply dumping about:memory to JSON once a second is unworkable. Has that been tried?
Comment 8•11 years ago
|
||
> - It'd be great if we could see more than just the max peak.
I think this is a pretty important case to consider. As soon as you see the peak and establish either that you can't fix it or that it's being fixed, you'll want to see the other peaks.
Assignee | ||
Comment 9•11 years ago
|
||
> - In addition to dumping about:memory to $TMPDIR, could we also get the
> stack trace of the malloc/mmap call that hit the peak? The two together
> would, I think, be a pretty useful combo.
Should be pretty easy. BTW, I think you'll like the memory reporting that Benoit is putting into SPS right now :)
Assignee | ||
Comment 10•11 years ago
|
||
I have a draft patch working, but there's a problem. Here's how it works. - about:memory has "start peak-finding mode" and "stop peak-finding mode" buttons. - When you start peak-finding mode, a new thread is spawned. It loops, reading RSS five times a second. If the RSS is 10% higher than it was at the last dump, it dumps reports to $TMPDIR. - When you stop peak-finding mode, the thread is canceled. Pretty simple. But in the "dump reports" step it ends up in nsMemoryReporterManager::EnumerateReporters(), which aborts because it isn't the main thread: NS_IMETHODIMP nsMemoryReporterManager::EnumerateReporters(nsISimpleEnumerator **result) { // Memory reporters are not necessarily threadsafe, so EnumerateReporters() // must be called from the main thread. if (!NS_IsMainThread()) { MOZ_CRASH(); } So I thought "aha, I'll just spawn a runnable to do the dumping, and dispatch it to the main thread!" That works, in that it avoids the crash. But the runnable can be significantly delayed. For example, I ran the Unreal demo and it spawned 15 dump runnables, but none of them actually dumped until the memory-intensive start-up sequence had finished -- i.e. I missed all the interesting bits, and ended up with lots of dumps that occurred all in quick succession and so gave almost identical measurements. I guess I need to change things so that reporters are thread-safe...?
Assignee | ||
Comment 11•11 years ago
|
||
Hmm. I reverted to the "dump from the peak-finding thread" version and disabled the assertion in EnumerateReporters() to see what would happen. I immediately hit a similar assertion: nsXPConnect* nsXPConnect::GetXPConnect() { // Do a release-mode assert that we're not doing anything significant in // XPConnect off the main thread. If you're an extension developer hitting // this, you need to change your code. See bug 716167. if (!MOZ_LIKELY(NS_IsMainThread() || NS_IsCycleCollectorThread())) MOZ_CRASH(); This function is called by the JS reporters to get ahold of the main JS runtime. Hmm. So I think I need a way to say "run this function on the main thread, right now, interrupting whatever it's doing". I'm not optimistic we have such a thing.
Assignee | ||
Comment 12•11 years ago
|
||
Or, a way to periodically (and frequently) run the peak-finding function on the main thread itself would suffice. I also can't see how that would be achieved. ("Hey, let's interrupt the JS parser in the middle of parsing this huge script to run some arbitrary Gecko function...")
Comment 13•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #11) > So I think I need a way to say "run this function on the main thread, right > now, interrupting whatever it's doing". I'm not optimistic we have such a > thing. You can get this on mac/linux with signaling but not windows AFAIK. You want to tgkill with the main thread's TID. The registered signal handler will run on the main thread. However everything you call must be signal safe as you know so calling Gecko code with again immediately blow up.
Comment 14•11 years ago
|
||
> So I think I need a way to say "run this function on the main thread, right now, interrupting
> whatever it's doing". I'm not optimistic we have such a thing.
You can try the mechanism used by nsThread:::ScheduleMemoryPressureEvent. It's pretty simple. You're still going to wait for the next turn of the event loop, though.
Assignee | ||
Comment 15•11 years ago
|
||
> You can get this on mac/linux with signaling but not windows AFAIK. You want > to tgkill with the main thread's TID. The registered signal handler will run > on the main thread. However everything you call must be signal safe as you > know so calling Gecko code with again immediately blow up. So... no hope there. > You can try the mechanism used by nsThread:::ScheduleMemoryPressureEvent. > It's pretty simple. You're still going to wait for the next turn of the > event loop, though. I tried that, it's still not immediate enough for the Unreal demo -- 13 dump requests were made before we got back to the event loop. I'm out of ideas :(
Assignee | ||
Comment 16•11 years ago
|
||
Hmm... the JS engine has it's "operation callback", which it checks and possibly runs during long-running operations. I might be able to use that. It currently doesn't get checked during parsing, but it should be possible to add a check there. Assuming that can be dealt with, it's unclear how many other places in the browser have potentially long-running, memory-intensive operations that take place between iterations of the main event loop. We'll just have to suck it and see.
Assignee | ||
Comment 17•11 years ago
|
||
The operation callback does the trick! Here's a rough draft patch that uses it. Things to note: - Every time we hit a peak, the patch (a) triggers a JS engine operation callback, and (b) schedules an event on the main Gecko thread. Whichever is hit first does the dump, and nullifies the other one. - I added some operation callback checks (JS_CHECK_OPERATION_LIMIT) to the parser. This is enough that all but two of the dump requests get handled immediately (before it was 13). I'm pretty sure the remaining two are due to Ion compilation, because in the last dump before the skipped two there is no asm.js code, and in the first one afterwards there is 84 MiB of it. So I need to find the long-running loop(s) in Ion and add operation callback checks to them. - I ended up putting most of the code (including nsMemoryReporterManager::FindPeaks()) in nsJSEnvironment.cpp, because that file already imports tons of stuff; plus it has a pointer to the main JSRuntime just hanging around, which I needed to trigger the operation callback. - It's only been tested on Linux, and uses various non-portable operations like pthread_create() and usleep(). It also spams stderr with various messages.
Assignee | ||
Comment 18•11 years ago
|
||
Here's an excerpt of the measured peak running the Citadel demo. Note the 1304 MiB of "temporary" (i.e. parse nodes) memory. The actual peak was about 100--150 MiB higher; I suspect that this measurement misses some memory that IonMonkey transiently allocated at the true peak. 1,711.08 MB (100.0%) -- explicit ├──1,416.63 MB (82.79%) -- js-non-window │ ├──1,394.68 MB (81.51%) -- runtime │ │ ├──1,304.00 MB (76.21%) ── temporary │ │ ├─────84.91 MB (04.96%) -- code │ │ │ ├──84.54 MB (04.94%) ── asm.js │ │ │ └───0.38 MB (00.02%) ++ (6 tiny) │ │ └──────5.77 MB (00.34%) ++ (10 tiny) │ ├─────19.03 MB (01.11%) ++ zones │ └──────2.92 MB (00.17%) ++ gc-heap ├────233.16 MB (13.63%) ── heap-unclassified ├─────32.11 MB (01.88%) -- window-objects │ ├──18.89 MB (01.10%) -- top(file:///home/njn/Desktop/asm-js/EpicCitadel/index.html, id=15) │ │ ├──18.69 MB (01.09%) -- active │ │ │ ├──17.84 MB (01.04%) -- window(file:///home/njn/Desktop/asm-js/EpicCitadel/index.html) │ │ │ │ ├──17.59 MB (01.03%) ++ js-compartment(file:///home/njn/Desktop/asm-js/EpicCitadel/index.html) │ │ │ │ └───0.25 MB (00.01%) ++ (4 tiny) │ │ │ └───0.85 MB (00.05%) ++ (2 tiny) │ │ └───0.20 MB (00.01%) ++ js-zone(7fbdb8bf8800) │ └──13.21 MB (00.77%) ++ (4 tiny) └─────29.18 MB (01.71%) ++ (17 tiny)
Comment 19•11 years ago
|
||
Wild and presumptive idea: in this about:memory "peak finding mode", wouldn't it be cool if about:memory displayed a graph with time on the x axis and total explicit usage on y axis with nodes that you could click on that displayed the dumps. Even cooler if, say, window.dump/console.log messages showed up on this timeline so that the user could instrument their code to see relatively when things were happening.
Assignee | ||
Comment 20•11 years ago
|
||
> - It'd be great if we could see more than just the max peak. E.g., right
> now I'm pretty sure the peak for the UE3 demos is the parse nodes, but there
> are 3 or more shorter peaks after that that I'd like to understand as well.
> Perhaps a large free() call could reset the peak measurement?
Measuring RSS every 200 ms I see two main peaks: first, it rises steadily to the parsing/compilation peak of about 2.1 GB, and then it briefly drops back to 530 MB before climbing to 1.3 GB (albeit with a small, temporary dip from 800 MB to 780 MB) while the demo is running.
I think having a "single peak-finding mode" and a "multiple peak-finding mode" might be useful. Sometimes you just want one peak, and you don't want to end up with lots of dumps in a session where memory consumption goes up and down a lot. But sometimes you really do want to know about multiple peaks, as you say. In multiple mode I could reset the peak if RSS drops by half, perhaps? Getting the multi-peak heuristic right could be tricky.
Comment 21•11 years ago
|
||
> Getting the multi-peak heuristic right could be tricky.
Perhaps we could do something like measure once every Y * {how long measuring takes} (perhaps Y=10) and also measure when RSS increases or decreases by at least Z% over the last time we measured?
Assignee | ||
Comment 22•11 years ago
|
||
I'm putting this on the backburner. Here's the most recent patch, which applies cleanly on top of 134901:22c88d3d898a.
Assignee | ||
Updated•11 years ago
|
Attachment #737337 -
Attachment is obsolete: true
Assignee | ||
Comment 23•10 years ago
|
||
bkelly, I wonder if it would be useful to have the ability to trigger a memory report dump the first time RSS exceeds a certain value?
Flags: needinfo?(bkelly)
Comment 24•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #23) > bkelly, I wonder if it would be useful to have the ability to trigger a > memory report dump the first time RSS exceeds a certain value? That would absolutely be valuable for situations where the process is getting OOM killed on a device. We could set the limit just under where the OOM occurs and then dump the memory report before its killed. Ideally it would be the combination RSS+swap, though, as leaking processes on tarako tend to just inflate swap. The two obstacles that come to mind are: 1) Where/when do we check for the increase? Ideally we want to catch the increase as soon as possible, but increases may not all come via malloc, etc. 2) How do we get the report across ADB to the user? Stick them on disk somewhere and then wait for the user to pull them later with ADB? If you're asking me if I have time to work on this, however, I don't think I will for the foreseeable future.
Flags: needinfo?(bkelly)
Assignee | ||
Comment 25•10 years ago
|
||
> 1) Where/when do we check for the increase? Ideally we want to catch the > increase as soon as possible, but increases may not all come via malloc, etc. Previous comments have some discussion of this. > 2) How do we get the report across ADB to the user? Stick them on disk > somewhere and then wait for the user to pull them later with ADB? Something something get_about_memory.py mumble mumble. > If you're asking me if I have time to work on this, however, I don't think I > will for the foreseeable future. Nope, wasn't asking that. One of the difficulties of this bug is getting the peak-finding heuristics right. If we can just get the user to set a threshold and do one dump when we hit that, it makes things quite a bit simpler.
Assignee | ||
Updated•3 years ago
|
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•