1145005 - [MTBF][Memory Report] Memory report pulling causing processes to exit

Reporter

Description

•

9 years ago

When doing memory report pulling on B2G, processes exit a lot. Is there a way to ease this pressure? 

Got 0/10 files.
Warning: Child 4544 exited during memory reporting
10:49:00 
Warning: Child 6920 exited during memory reporting
10:49:00 
Warning: Child 4927 exited during memory reporting
10:49:00 
Warning: Child 4470 exited during memory reporting
10:49:00 
Warning: Child 4809 exited during memory reporting
10:49:01 
Got 0/5 files.
Warning: Child 7286 exited during memory reporting
10:49:03 
Got 0/4 files.
Got 1/1 files.

Walter Chen[:ypwalter][:wachen]

Reporter

Updated

•

9 years ago

Blocks: MTBF-Memory

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 1

•

9 years ago

It would be good to confirm that the processes are exiting because of the memory pressure of memory reporting, rather than for some other reason.

There is definitely room for improvement here — PMemoryReportRequest sends the child's entire memory report as a single array rather than streaming it.  This is a known issue and probably not too difficult to fix; it just hadn't been reported as being enough of a problem in practice to give it high priority.

Beyond that, if there's more per-child-process overhead that's not as easy to deal with, it might make sense to consider collecting child process reports serially (or at least limiting the concurrency to the physical CPU count) to keep the peak overhead down.

Whiteboard: [MemShrink]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 2

•

9 years ago

In particular, it would be nice to verify that the children aren't crashing, like in bug 1125490.

Paul Yang [:pyang] (away)

Updated

•

9 years ago

Depends on: 1149085

Eric Rahm [:erahm]

Updated

•

9 years ago

No longer depends on: 1149085

Eric Rahm [:erahm]

Updated

•

9 years ago

Depends on: 1149085

Jet Villegas (inactive)

Updated

•

9 years ago

Whiteboard: [MemShrink] → [MEMSHRINK:P2]

Jet Villegas (inactive)

Updated

•

9 years ago

Whiteboard: [MEMSHRINK:P2] → [MemShrink:P1]

Eric Rahm [:erahm]

Comment 3

•

9 years ago

After discussing in #memshrink we bumped this to P1, we should make our best effort to not increase memory consumption while performing memory testing (regardless of whether it causes OOMs). I've definitely seen OOMs on low-end devices in the past that were pretty clearly triggered by generating a memory report.

There are several potential ways we can address this as noted by Jed in comment 1, as well as a few more:
- Stream the memory reports from child to parent
- Do not perform memory reports in parallel
- Go back to writing the individual reports to a file, merge them (or not) in the parent process
- Reduce the overall size of memory reports
  - The reports currently are rather verbose with thousands of copies of descriptions, perhaps add an option to omit those on b2g
  - If we serialize to a file instead we might want to look at making the json format more terse / switch to a binary format

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Updated

•

9 years ago

Depends on: 1151597

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 4

•

9 years ago

Filed bug 1151597 for getting rid of the big array in PMemoryReportRequest.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 5

•

9 years ago

(In reply to Eric Rahm [:erahm] from comment #3)
> I've definitely seen OOMs on low-end devices in the past that
> were pretty clearly triggered by generating a memory report.

I assume those devices were using swap in the form of zRAM.  Testing on my Flame suggests that that's responsible for a lot of this: reporting a process's memory causes a certain amount of swap-in, and reporting every process's memory at once (when free/evictable memory is already scarce) causes a few to be OOM-killed — and the resulting state has about the same amount of free and cached memory as before, but a lot more free swap.  If I disable zRAM (and adjust the amount of RAM correspondingly), it's very hard to reproduce this even without any other changes.

Which brings us to:

> - Do not perform memory reports in parallel

I haven't implemented this yet, but I suspect it will be a much bigger win than bug 1151597 (which I have implemented — empirically it doesn't seem to help much, compared to switching from zRAM to real RAM).

>   - The reports currently are rather verbose with thousands of copies of
> descriptions, perhaps add an option to omit those on b2g
>   - If we serialize to a file instead we might want to look at making the
> json format more terse / switch to a binary format

I'm not so sure this is significant, at least for B2G — even with 10 child processes, the uncompressed JSON is only 3-4 MiB.

Shako Ho

Comment 6

•

9 years ago

After 22hrs running on mtbf, we found the messages below, and it will cause the mtbf exited with exception.

Got 0/9 files.
Warning: Child 19600 exited during memory reporting
Warning: Child 19505 exited during memory reporting
Warning: Child 19690 exited during memory reporting
Warning: Child 17508 exited during memory reporting
Warning: Child 19903 exited during memory reporting
Got 0/4 files.
Warning: Child 18404 exited during memory reporting
Warning: Child 20227 exited during memory reporting
Warning: Child 2132 exited during memory reporting
Warning: Child 1295 exited during memory reporting
Got 0/0 files.Build was aborted
Archiving artifacts
Command adb shell 'echo -n "gc log" > "/data/local/debug_info_trigger"; echo -n "|$?"' failed with error code 143
Terminated
Pulled files into /var/jenkins/workspace/flamekk.v2.2.moztwlab01.319.mtbf_op@7/label/moztwlab-01/output_164/about-memory43.
Failed to retrieve memory reports
Pulling GC/CC logs...
Crash in get-about-memory

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Updated

•

9 years ago

Depends on: 1154053

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 7

•

9 years ago

More observations:

1. MinimizeMemoryUsage actually increases memory usage, substantially, when called on a mostly swapped-out process — enough that trying to minimize the parent process can kill several children (i.e., serialization won't help here).  But it's not used by default.

2. Verbose GC/CC logs, which are the default for get_about_memory.py, seem to have the same concurrency problem as memory reports, and at a roughly similar magnitude.  Abbreviated GC/CC logs seem to not be a problem.

Bobby Chien

Updated

•

9 years ago

OS: Linux → Gonk (Firefox OS)

Hardware: x86_64 → ARM

Eric Rahm [:erahm]

Comment 8

•

9 years ago

All blocking bugs have been fixed.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Bugzilla

Quick Search

[MTBF][Memory Report] Memory report pulling causing processes to exit

Categories

(Toolkit :: about:memory, defect)

Tracking

()

People

(Reporter: wachen, Unassigned)

References

Details

(Whiteboard: [MemShrink:P1])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Updated

Updated

Updated

Updated

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated

Comment 8