Make memory-report gathering work on an unrooted B2G device

RESOLVED FIXED

Status

RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

Tracking

unspecified

Firefox Tracking Flags

(blocking-b2g:tef+, firefox22 wontfix, firefox23 wontfix, firefox24 fixed, b2g18 verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 verified, b2g-v1.1hd fixed)

Details

(Whiteboard: [tef-triage])

Attachments

(3 attachments, 2 obsolete attachments)

(Assignee)

Description

6 years ago
Right now get_about_memory.py only works on rooted devices (it sends a signal to the B2G process, which requires root).

In theory you can pull an about:memory report of an unrooted device by writing to a fifo that's globally-writable.  We need to modify the script to do this and check that it actually works.
(Assignee)

Comment 1

6 years ago
I'm not going to have time to work much this week (I'm moving), so maybe someone else can do this.  The FIFO is /data/local/debug_info_trigger.  You should be able to write

 "memory report"
 "minimize memory report"
 "gc log"

to the fifo.  See nsMemoryInfoDumper.cpp.
(Assignee)

Comment 2

6 years ago
Created attachment 755664 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/B2G/pull/249

Pointer to Github pull-request
(Assignee)

Comment 3

6 years ago
Comment on attachment 755664 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/B2G/pull/249

I've tested this as best I can, but after we land it we're going to need QA to test this on actual, honest-to-God user builds.
Attachment #755664 - Flags: review?(dhylands)
(Assignee)

Updated

6 years ago
Assignee: nobody → justin.lebar+bug
Comment on attachment 755664 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/B2G/pull/249

Looks good to me.
Attachment #755664 - Flags: review?(dhylands) → review+
(Assignee)

Comment 5

6 years ago
Merged: https://github.com/mozilla-b2g/B2G/commit/6fc34102a752897a73fbbe08a98046b1715e2f44

QA: I haven't been able to get an honest-to-God user build, so I need your help verifying this fix.  This is a really important fix to verify because if this doesn't work, we can't get memory reports off a phone in the wild.  These memory reports are crucial for understanding a variety of different bugs.

Here's how to do it:

1) Take a user build of a device.  It can be a build from last week or earlier, because this change doesn't affect the bits on the phone.

2) On mac or linux, run

  $ git clone https://github.com/mozilla-b2g/B2G

or, if you already have the B2G repository cloned update your tree (with git pull or git fetch and git merge).

3) $ cd B2G/tools

4) $ ./get_about_memory.py

This should output something like

> Got 5/5 files.
> Pulled files into about-memory-1.

It might not say 5/5; it may be 3/3 or something.  That's fine.

5) Then please zip up the about-memory-N directory that's generated and attach it to this bug.
Keywords: qawanted
(Assignee)

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
QA Contact: jsmith
And, more specifically, this particular test requires adb to be enabled, and we'd like to confirm that the adb shell uses a non-root accoumt.

You can tell by doing:

adb shell id

If it reports:

uid=0(root) gid=0(root)

then its a root shell, which is what most of us developers have. We're more interested in a test from a phone which reports something like:

uid=2000(shell) gid=2000(shell) groups=...

jlebar: so I tried it on my leo phone (which comes up with the shell user by default) and I got the following:

2189 >./get_about_memory.py 
Got 2/2 files.
Pulled files into about-memory-0.
Remote command /system/bin/toolbox "rm" /data/local/tmp/memory-report-1370139410-437.json.gz /data/local/tmp/memory-report-1370139410-149.json.gz failed with error code 255
rm failed for /data/local/tmp/memory-report-1370139410-437.json.gz, Operation not permitted

Traceback (most recent call last):
  File "./get_about_memory.py", line 259, in <module>
    get_and_show_dump(args)
  File "./get_about_memory.py", line 177, in get_and_show_dump
    (merged_reports_path, dmd_files) = get_dumps(args)
  File "./get_about_memory.py", line 174, in get_dumps
    return utils.run_and_delete_dir_on_exception(do_work, out_dir)
  File "/home/work/B2G-leo/tools/include/device_utils.py", line 142, in run_and_delete_dir_on_exception
    return fun()
  File "./get_about_memory.py", line 160, in do_work
    optional_outfiles_prefixes=['dmd-'])
  File "/home/work/B2G-leo/tools/include/device_utils.py", line 216, in notify_and_pull_files
    _remove_files_from_device(all_outfiles_prefixes, old_files)
  File "/home/work/B2G-leo/tools/include/device_utils.py", line 310, in _remove_files_from_device
    remote_toolbox_cmd('rm', ' '.join([str(f) for f in files_to_remove]))
  File "/home/work/B2G-leo/tools/include/device_utils.py", line 56, in remote_toolbox_cmd
    return remote_shell('/system/bin/toolbox "%s" %s' % (cmd, args))
  File "/home/work/B2G-leo/tools/include/device_utils.py", line 39, in remote_shell
    raise subprocess.CalledProcessError(retcode, cmd, cmd_out)
subprocess.CalledProcessError

ls -l /data/local/tmp reports:

prw-rw-rw- root     root              2013-06-01 19:15 fifo_cmd
prw-rw-rw- root     root              2013-06-01 19:15 fifo_dat
-rw-r----- root     root        24592 2013-06-01 19:16 memory-report-1370139410-149.json.gz
-rw-r----- app_437  app_437     13264 2013-06-01 19:16 memory-report-1370139410-437.json.gz
-rw-rw-rw- root     root           13 2013-06-01 19:15 profile_calib_a
-rw-rw-rw- root     root            0 2013-06-01 19:14 profile_calib_m
---------- root     root        12288 2013-06-01 19:15 sensord.log

So, I think that the memory-report files are having their permissions masked by the umask in effect from /system/bin/b2g.sh (which does a umask 027)

This means that we need to change the permissions on the files after we create them. Incidently, while it said it pulled 2 files, there was no local about-memory-0 directory. This makes sense, since the shell user can't read the files, so the pull would have failed.

I'll clear qawanted until we have a patch that at least works on leo.
Keywords: qawanted
QA Contact: jsmith
(Assignee)

Comment 7

6 years ago
Created attachment 755758 [details] [diff] [review]
Untested patch (does seem to compile)
(Assignee)

Comment 8

6 years ago
Comment on attachment 755758 [details] [diff] [review]
Untested patch (does seem to compile)

Would you mind testing this, Dave?  I'm hopeful this is all we need to do.
Attachment #755758 - Flags: feedback?(dhylands)
(Assignee)

Comment 9

6 years ago
I'm going to morph this bug into the issue of making memory reports work on an unrooted device.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Modify B2G/tools/get_about_memory.py to work on an un-rooted device → Make memory-report gathering work on an unrooted B2G device
(Assignee)

Comment 10

6 years ago
This is a critical blocking issue: Without this fix, we will be unable to retrieve memory reports off devices.  These memory reports are a key tool for our understanding of a large class of bugs.  If we can't make memory reports work, we will have no insight into these bugs, which will compromise our ability to improve new versions of the product.

It's likely that whatever changes we have to make here will be very safe: They should only affect code which is run when dumping memory reports.  The worst thing that should be able to happen with a patch here is that we break memory-report dumping.  But it's already broken, so that's a no-brainer risk to take.
blocking-b2g: --- → tef?
Comment on attachment 755758 [details] [diff] [review]
Untested patch (does seem to compile)

Review of attachment 755758 [details] [diff] [review]:
-----------------------------------------------------------------

So this change makes the files retrievable, however they're still not removable.

The issue is that /data/local/tmp is created as a true temporary directory and has the sticky bit set (if you look in /system/bin/b2g.sh you'll see the line:
chmod 1777 $TMPDIR

The 1 is the sticky-bit. It allows the /data/local/tmp directory to be world-writable, but only allows the process which owns the file to remove it.

So,if we were to create a /data/local/tmp/memory-reports directory and give it 0666 permissions and put the memory-report*.gz files in that directory, then the get_about_memory_.py could remove the files within that directory, but would have to leave the directory itself around, which I think is a reasonable compromise.
Attachment #755758 - Flags: feedback?(dhylands) → feedback+
I said that only the process which owns the file can remove it (in a directory with the sticky bit set), but I meant only the user which owns the file can remove it.
Lukas, can you decide on this one?
Whiteboard: [tef-triage]
(Assignee)

Comment 14

6 years ago
What if we put it in /data/local instead?
/data/local/tmp is the only location writable by the child content processes.

shell@android:/ $ ls -ld /data/local
drwxr-x--x root     root              2013-06-01 21:15 local

So you still wouldn't be able to remove it. Note that /data/local/tmp gets cleared on a reboot.
(Assignee)

Comment 16

6 years ago
Don't I want 777 on the directory, not 666?  Otherwise I can list the directory but not read the files inside?
(In reply to Justin Lebar [:jlebar] from comment #16)
> Don't I want 777 on the directory, not 666?  Otherwise I can list the
> directory but not read the files inside?

Doh - of course :) What was I thinging (never mind - don't answer that).
Following up in email.
(Assignee)

Comment 19

6 years ago
Created attachment 756200 [details] [diff] [review]
Patch, v1

Are you up for reviewing this, Dave?
Attachment #756200 - Flags: review?(dhylands)
(Assignee)

Updated

6 years ago
Attachment #755758 - Attachment is obsolete: true
(Assignee)

Updated

6 years ago
Attachment #755664 - Attachment is obsolete: true
(Assignee)

Comment 20

6 years ago
Created attachment 756202 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/B2G/pull/251

Pointer to Github pull-request
Comment on attachment 756200 [details] [diff] [review]
Patch, v1

Review of attachment 756200 [details] [diff] [review]:
-----------------------------------------------------------------

I tested this on my leo device with a "shell user" adb shell (as opposed to a root user adb shell) and it all seems to be working fine).

I saw that the files got removed from /data/local/tmp/memory-reports.

firefox was launched with the memory report. I'll attach it for completeness.

I also wound up being slightly confused about the directory on the target being named "memory-reports" and the file on the host being named "memory-reports", but that's pretty minor.

::: xpcom/base/nsMemoryInfoDumper.cpp
@@ +728,5 @@
> +  nsAutoCString dirPath;
> +  rv = (*aFile)->GetNativePath(dirPath);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  while (chmod(dirPath.get(), 0777) == -1 && errno == EINTR) {}

Since we're using ANDROID, we could use:

TEMP_FAILURE_RETRY(chmod(dirPath.get(), 0777));

which wraps up the retry on -1 and EINTR. I'll leave it up to you to decide if you want to change this or not.

TEMP_FAILURE_RETRY can be found in bionic/libc/include/unistd.h (so #include <unistd.h>)

@@ +747,5 @@
>      nsAutoCString path;
>      rv = file->GetNativePath(path);
>      NS_ENSURE_SUCCESS(rv, rv);
>  
> +    while (chmod(path.get(), 0666) == -1 && errno == EINTR) {}

Same thing here.
Attachment #756200 - Flags: review?(dhylands) → review+
Created attachment 756243 [details]
memory-reports summary file collected from my leo device (running on master)
oh yeah - the gecko patch attached assumes that the obsoleted patch is also applied) - just an FYI - not a problem if you're landing yourself.
(Assignee)

Updated

6 years ago
Attachment #756202 - Flags: review?(dhylands)
(Assignee)

Comment 24

6 years ago
I guess I'd rather write portable code, even if it's Android only.  The macro doesn't save a lot of space.
Attachment #756202 - Flags: review?(dhylands) → review+
(In reply to Justin Lebar [:jlebar] from comment #24)
> I guess I'd rather write portable code, even if it's Android only.  The
> macro doesn't save a lot of space.

The macro is also available from glibc. I think it may not have been available in earlier versions of android, but appears to be available now.

It also appears that we have MOZ_TEMP_FAILURE_RETRY:
https://mxr.mozilla.org/mozilla-central/source/xpcom/glue/FileUtils.h#157

What you've coded is fine. I just wanted to point this out for future reference (using the macro, you at least know it's coded properly).
(Assignee)

Comment 27

6 years ago
Oh, if we have a macro in Mozilla code, that's nice.

Although it doesn't check for EAGAIN (which isn't an issue here, but is an issue with some other syscalls).

There's another chmod in this file that needs EINTR treatment.  When we regain our sanity, we can convert them all to that macro.
(Assignee)

Comment 28

6 years ago
Alex, if I understood correctly, you said in an e-mail that this could be tef+.  Would you mind setting the flag?
(In reply to Justin Lebar [:jlebar] from comment #27)
> Oh, if we have a macro in Mozilla code, that's nice.
> 
> Although it doesn't check for EAGAIN (which isn't an issue here, but is an
> issue with some other syscalls).
> 
> There's another chmod in this file that needs EINTR treatment.  When we
> regain our sanity, we can convert them all to that macro.

EAGAIN should only really apply to files which are opened non-blocking. I would like to believe that we're not using non-blocking I/O unless we absolutely need to, since non-blocking I/O implies polling.

Using blocking I/O in conjunction with select/poll would be a better way to go.

I'm just speaking in generalities here, but I figured I say my piece:)
We'll tef+ this for any remaining 1.0.1 downstream partners.
blocking-b2g: tef? → tef+
Dave, Justin,
When this is uplifted, please add keywords: qawanted, verifyme.   QA can work through verifying the fixes, assuming comment 5 is still the correct way to test it.  if not, please add new steps to test as well.   Thanks.
https://hg.mozilla.org/mozilla-central/rev/c481c247673e
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Took some fixing up to land on b2g18, but it appears to build OK. You still might want to take a look to make sure everything works as intended.

https://hg.mozilla.org/releases/mozilla-b2g18/rev/c952a123b902
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/bb73520e9dce
status-b2g18: --- → fixed
status-b2g18-v1.0.0: --- → wontfix
status-b2g18-v1.0.1: --- → fixed
status-b2g-v1.1hd: --- → affected
status-firefox22: --- → wontfix
status-firefox23: --- → wontfix
status-firefox24: --- → fixed
Keywords: qawanted, verifyme

Updated

6 years ago
Keywords: qawanted

Updated

6 years ago
QA Contact: jsmith
Confirmed working using the STR in comment 5 with my Inari 6/4 b2g18 build. I got my memory report successfully from my device and viewed it in the browser.
status-b2g18: fixed → verified
Also confirmed working on a partner customized 1.01 Ikura build from 6/4. I got my memory report successfully from my device and viewed it in the browser.
status-b2g18-v1.0.1: fixed → verified
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.