931198 - about_memory fails to retrieve all files from b2g emulator

Reporter

Description

•

12 years ago

Attached file aboutmemorylogcat.txt — Details

B2G/tools/get_about_memory.py fails to pull all the files off of the b2g emulator if there are a large number of files (ie. over 30). Error message is "Unable to pull some files". More details found in the attached output from terminal and a logcat. Not pulling DMD or gcc. Easiest way to reproduce: - Start the b2g emulator with default settings - After startup, swipe right - Click on the first application icon (camera), wait for conplete app startup, then click the home screen to minimize - Repeat this for EACH application on the emulator - including those when swipe right again to the second full screen of icons; and don't forget the main apps at the very bottom too (phone etc). - Then run get_about_memory.py Around 33 files will be found, but always after pulling 28 it gets stuck and fails with the above noted message. If the emulator RAM is increaed from 512MB to 1024MB then this problem doesn't happen; all of the memory files are pulled fine. However this is not a viable solution as if the RAM is changed then the emulator will not be a valid representation of a Firefox OS device. Note that this is blocking AWSY Firefox OS automation.

Robert Wood [:rwood]

Reporter

Comment 1

•

12 years ago

Attached file aboutmemoryterminal.txt — Details

Jonathan Griffin (:jgriffin)

Comment 2

•

12 years ago

Kyle, can you take a look at this, or suggest someone who can?

Flags: needinfo?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 3

•

12 years ago

Is this something you have time to look into John?

Flags: needinfo?(khuey) → needinfo?(jschoenick)

John Schoenick [:johns]

Assignee

Comment 4

•

12 years ago

The logcat suggests that some content processes are crashing, probably OOMing given that higher memory limits fix this: > I/Gecko ( 45): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv Collecting memory reports in many processes simultaneously is likely to spike memory usage, I'm not sure what the best mitigation is for that. When I get a chance I can figure out how to build the b2g emulator to verify

Flags: needinfo?(jschoenick)

Jonathan Griffin (:jgriffin)

Comment 5

•

12 years ago

See https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS?menu for instructions on how to build for B2G; the target you want here is 'emulator'.

Nicholas Nethercote [inactive]

Comment 6

•

12 years ago

> Collecting memory reports in many processes simultaneously is likely to > spike memory usage, I'm not sure what the best mitigation is for that. It shouldn't be that bad. Collecting the measurements takes little if any memory, since it's mostly a matter of traversing lots of data structures. Writing to file takes a moderate amount because the gzipping needs a buffer, which I think is 32KB?

Jonathan Griffin (:jgriffin)

Comment 7

•

12 years ago

Did you have time to look into this :johns? It's completely blocking progress on B2G memory tests. If you don't have time, let me know so I can try to find another owner.

Flags: needinfo?(jschoenick)

John Schoenick [:johns]

Assignee

Comment 8

•

12 years ago

I'm working on a few higher priority things right now, and I need to set aside time to get a B2G emulator/builds up and running to debug this -- I'm happy to take this, but I won't be able to get to it for a week or two.

Flags: needinfo?(jschoenick)

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Blocks: 941809

Jonathan Griffin (:jgriffin)

Comment 9

•

12 years ago

(In reply to John Schoenick [:johns] from comment #8) > I'm working on a few higher priority things right now, and I need to set > aside time to get a B2G emulator/builds up and running to debug this -- I'm > happy to take this, but I won't be able to get to it for a week or two. Thanks; I'm going to assign this to you so it doesn't fall through the cracks; please unassign if you feel you won't be able to get to it in a week or two.

Assignee: nobody → jschoenick

Robert Wood [:rwood]

Reporter

Updated

•

11 years ago

Blocks: 899194
No longer blocks: 941809

Preeti Raghunath(:Preeti)

Comment 10

•

11 years ago

This is a huge memory failure and would like a fix at the earliest.

blocking-b2g: --- → 1.3+

Flags: needinfo?(jschoenick)

John Schoenick [:johns]

Assignee

Comment 11

•

11 years ago

I setup a B2G build environment and am looking into this now.

Flags: needinfo?(jschoenick)

John Schoenick [:johns]

Assignee

Comment 12

•

11 years ago

So this is indeed due to a OOM - after opening all processes we often have only a few megs free memory, and asking every process to dump memory triggers a spike that results in a some processes being killed: > <4>select 524 (Twitter share), adj 14, size 3986, to kill > <4>select 579 (Music2), adj 14, size 4529, to kill > <4>send sigkill to 579 (Music2), adj 14, size 4529 > <4>select 143 (Calendar), adj 12, size 4761, to kill > <4>select 524 (Twitter share), adj 14, size 3941, to kill > <4>select 579 (Music2), adj 14, size 4471, to kill > <4>send sigkill to 579 (Music2), adj 14, size 4471 > <4>select 143 (Calendar), adj 12, size 4890, to kill > <4>select 524 (Twitter share), adj 14, size 3951, to kill > <4>send sigkill to 524 (Twitter share), adj 14, size 3951 In this case it attempts to kill music2 (twice?) and twitter share - and only 30/32 expected reports are pulled. I will see if I can find any low hanging fruit for minimizing the amount of memory needed to dump all reports, but it is probably unrealistic to expect to dump all memory reports without any allocations.

Nicholas Nethercote [inactive]

Comment 13

•

11 years ago

> I will see if I can find any low > hanging fruit for minimizing the amount of memory needed to dump all > reports, but it is probably unrealistic to expect to dump all memory reports > without any allocations. You should read bug 943744. AFAIK the memory spike during dumping is almost entirely due to the notable string detection. I improved things on that front in that bug, though there is still a spike.

John Schoenick [:johns]

Assignee

Comment 14

•

11 years ago

(In reply to Nicholas Nethercote [:njn] from comment #13) > You should read bug 943744. AFAIK the memory spike during dumping is almost > entirely due to the notable string detection. I improved things on that > front in that bug, though there is still a spike. Can this be reasonably backported to the b2g branches?

Flags: needinfo?(n.nethercote)

John Schoenick [:johns]

Assignee

Comment 15

•

11 years ago

So dumping memory reports necessarily uses some memory, even with bug 943744 there will be the edge case where sending the dump signal OOMs a process. Is adjusting the harness to exclude these processes from dumps the host expects to receive, and possibly noting that some processes died during memory reporting in the resultant files, enough to solve this for AWSY-B2Gs purposes?

Flags: needinfo?(rwood)

[:fabrice] Fabrice Desré

Comment 16

•

11 years ago

(In reply to John Schoenick [:johns] from comment #15) > So dumping memory reports necessarily uses some memory, even with bug 943744 > there will be the edge case where sending the dump signal OOMs a process. Is > adjusting the harness to exclude these processes from dumps the host expects > to receive, and possibly noting that some processes died during memory > reporting in the resultant files, enough to solve this for AWSY-B2Gs > purposes? I think that's fine to do that, and we already sometimes encounter this situation also on low memory devices like the tarako.

Nicholas Nethercote [inactive]

Comment 17

•

11 years ago

> Can this be reasonably backported to the b2g branches? I think so. Try it, and if the patches don't apply I'll take a look to see if the conflicts are bad.

Flags: needinfo?(n.nethercote)

Robert Wood [:rwood]

Reporter

Comment 18

•

11 years ago

Thanks guys. So if I'm understanding correctly, using the get_about_memory script may actually kill some running processes on the emulator? Which processes can be killed? The plan is to run the test for 6 hours straight each time, and grab about_memory logs at various intervals, without restarting the emulator. I'm wondering if some of the emulator/gaia processes are killed during dumping the memory reports, if that will have an adverse affect on the emulator/gaia itself? Wouldn't this potentially make the resulting data inconsistent each time the test is performed? Is it ok to leave the emulator running after dumping the data? Or will I need to just grab the about_memory once at the end of the test duration? Thanks - just want to make sure I am understanding correctly. :)

Flags: needinfo?(rwood) → needinfo?(jschoenick)

Jean Gong :jgong

Comment 19

•

11 years ago

PM triaged this bug and believes that it should be a blocker.

John Schoenick [:johns]

Assignee

Comment 20

•

11 years ago

(In reply to Robert Wood [:rwood] from comment #18) > Thanks guys. So if I'm understanding correctly, using the get_about_memory > script may actually kill some running processes on the emulator? Which > processes can be killed? The plan is to run the test for 6 hours straight > each time, and grab about_memory logs at various intervals, without > restarting the emulator. When the device runs out of memory, the OOM killer picks a old background process to terminate. I'm not very familiar with this code, but it's my understanding that this happens often in normal use -- e.g. if the foreground app tried to do something memory intensive at this moment, some background tasks would be killed to free up space. In this case, attempting to dump all memory reports is that trigger, rather than some user action. > I'm wondering if some of the emulator/gaia processes are killed during > dumping the memory reports, if that will have an adverse affect on the > emulator/gaia itself? Since old background tasks are culled in normal operation, this should be no more harmful than if a memory intensive app were opened at that point. > Wouldn't this potentially make the resulting data > inconsistent each time the test is performed? I'm not sure how deterministic the OOM killer is, but desktop AWSY, for instance, is not very deterministic. So, it may be the case that more/fewer processes are still around when we get to the checkpoint, even before considering potential OOM deaths. > Is it ok to leave the emulator > running after dumping the data? Or will I need to just grab the about_memory > once at the end of the test duration? Thanks - just want to make sure I am > understanding correctly. :) Since OOM killing is normal, there should be no detriment to this happening. The test simply needs to be aware that the act of measuring memory may OOM-kill some old background apps -- e.g. using the dumps to graph remaining free memory would be misleading, since taking the dump actually freed up a few megs to process itself. (You could still get that datapoint, however, by e.g. querying meminfo before taking a full dump)

Flags: needinfo?(jschoenick)

Robert Wood [:rwood]

Reporter

Comment 21

•

11 years ago

:johns Excellent, thank you for the info!

John Schoenick [:johns]

Assignee

Comment 22

•

11 years ago

@rwood, So bug 943744, which reduces the overhead of these memory spikes (but doesn't 100% solve the issue as noted) landed on 28 -- for AWSY-B2G's purposes do we care about uplifting that to the B2G26 branches (1.2, 1.2f?)

Flags: needinfo?(rwood)

Jonathan Griffin (:jgriffin)

Comment 23

•

11 years ago

No, we don't intend to run the AWSY test on 1.2x.

Flags: needinfo?(rwood)

John Schoenick [:johns]

Assignee

Comment 24

•

11 years ago

Attached file Link to Github Pull Request: https://github.com/mozilla-b2g/B2G/pull/318 — Details

This inlines the _wait_for_remote_files only used in this spot, and checks for child PIDs that vanished inside the loop. The implications of children vanishing discussed in comment 20

Attachment #8366150 - Flags: review?(khuey)

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Attachment #8366150 - Flags: review?(khuey) → review+

John Schoenick [:johns]

Assignee

Updated

•

11 years ago

Keywords: checkin-needed

[:fabrice] Fabrice Desré

Comment 25

•

11 years ago

https://github.com/mozilla-b2g/B2G/commit/64ff91b94cfbddf7d1af813a4aa64f4817d63dde

Status: NEW → RESOLVED

Closed: 11 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Comment 26

•

11 years ago

John, can you please assist with this uplift? :)

Flags: needinfo?(jhford)

John Ford [:jhford] CET/CEST Berlin Time

Comment 27

•

11 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #26) > John, can you please assist with this uplift? :) There are no uplifts for B2G, sadly. The contents of this repository are shared between all branches.

status-b2g-v1.3: --- → fixed

Flags: needinfo?(jhford)

Robert Wood [:rwood]

Reporter

Comment 28

•

11 years ago

:johns Hi, I notice lately while using get_about_memory with the emulator, gaia on the emulator reports that the homescreen has crashed (crash reporter) and the terminal notes "Warning: Child 155 exited during memory reporting". I'm assuming this is as expected as noted in comment 20? It does seem to recover ok. Just want to verify this is expected and not a new regression, thanks

Flags: needinfo?(jschoenick)

John Schoenick [:johns]

Assignee

Comment 29

•

11 years ago

(In reply to Robert Wood [:rwood] from comment #28) > :johns > > Hi, I notice lately while using get_about_memory with the emulator, gaia on > the emulator reports that the homescreen has crashed (crash reporter) and > the terminal notes "Warning: Child 155 exited during memory reporting". I'm > assuming this is as expected as noted in comment 20? It does seem to recover > ok. Just want to verify this is expected and not a new regression, thanks This sounds like a b2g bug that should be filed. It's expected that the phone kill old background tasks when it needs more ram, but I doubt we intend to nuke the home screen with an error dialog :-/

Flags: needinfo?(jschoenick)

Gabriele Svelto [:gsvelto]

Comment 30

•

11 years ago

(In reply to John Schoenick [:johns] from comment #29) > This sounds like a b2g bug that should be filed. It's expected that the > phone kill old background tasks when it needs more ram, but I doubt we > intend to nuke the home screen with an error dialog :-/ The homescreen can get killed by the LMK when we're low on memory but this won't trigger the crash screen. It sounds to me that we hit an actual bug here, maybe a defective memory reporter that segfaults the homescreen when computing memory usage. To figure out what's going on we could hook up gdb to the homescreen process, let it run, then run get_about_memory.py and see where things go boom.

Robert Wood [:rwood]

Reporter

Comment 31

•

11 years ago

Thanks guys. I created Bug 979879 for this new issue.

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-b2g-v1.3T: --- → fixed

status-b2g-v1.4: --- → fixed

Target Milestone: --- → 1.3 C3/1.4 S3(31jan)

aboutmemorylogcat.txt 12 years ago Robert Wood [:rwood] 76.10 KB, text/plain		Details
aboutmemoryterminal.txt 12 years ago Robert Wood [:rwood] 3.02 KB, text/plain		Details
Link to Github Pull Request: https://github.com/mozilla-b2g/B2G/pull/318 11 years ago John Schoenick [:johns] 43 bytes, text/x-github-pull-request	mccr8 : review+	Details \| Review