Closed Bug 828285 Opened 11 years ago Closed 5 years ago

Print better error messages from process_util_linux.cc

Categories

(Core :: IPC, defect, P5)

All
Linux
defect

Tracking

()

RESOLVED WONTFIX
mozilla21

People

(Reporter: cjones, Unassigned)

References

Details

(Whiteboard: [good first bug])

Attachments

(1 file)

Currently all the error messages are debug-only and don't include useful information like the PID or errno.  The failures in process management code are subtle and in some cases can be intermittent, so any debugging help the logging can give is extremely valuable.
This is a followup from bug 828115, which runs app processes needing camera permissions with reduced OS privileges.
Depends on: 828115
https://hg.mozilla.org/mozilla-central/rev/4b34af2ea374
Assignee: nobody → jones.chris.g
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
(Sorry, I bollocksed up the bug # in commit message.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Not working on this.)
Assignee: jones.chris.g → nobody
Log error messages whenever an error occurs in process_util_linux.cc using gProcessLog.print(). Two questions:

1. Should something else be used for logging other than gProcessLog.print()? I used that one since it was already used in some places of the code.

2. There are also process_util_bsd.cc, process_util_mac.mm, process_util_posix.cc and process_util_win.cc. Should I do similar changes to those files too as part of this bug?
Assignee: nobody → vd
Status: REOPENED → ASSIGNED
Attachment #730264 - Flags: review?
Attachment #730264 - Flags: review? → review?(dhylands)
Comment on attachment 730264 [details] [diff] [review]
Preliminary patch

Review of attachment 730264 [details] [diff] [review]:
-----------------------------------------------------------------

Technically, I'm not a peer here, so I'd recommend that you get bent or bsmedberg to review the next go-round.

::: i/ipc/chromium/src/base/process_util_linux.cc
@@ +207,5 @@
>    env.Merge(env_vars_to_set);
>    char * const *envp = env.AsEnvp();
>    if (!envp) {
> +    gProcessLog.print("Error launching %s: while duplicating environment: %s",
> +                      argv[0].c_str(), strerror(errno));

So why are we changing loggers? (this is mostly for my own education - I don't know that what you've done is good or bad)

gProcessLog seems to be using a totally different logging mechanism from DLOG.

These errors also seem really serious, so personally, I think it would be useful if these were logged to android's log, if that's available.

I also think that we should print the errno numeric value along with the strerror string.

Also, I noticed that gProcessLog essentially winds up doing a vfprintf using the format string, so these logs should have \n at the end.

@@ +227,5 @@
>        fd_shuffle2.push_back(InjectionArc(it->first, it->second, false));
>      }
>  
> +    if (!ShuffleFileDescriptors(&fd_shuffle1)) {
> +      gProcessLog.print("Error launching %s: ShuffleFileDescriptors(): %s",

We should probably put a comment here. We shouldn't be allocating memory between the fork and the exec, but it seems to me that this could.

We're going to fail anyways, so the log might succeed, and is probably still worth leaving in.

@@ +288,5 @@
>    {
>      static bool checked_pix_max, pix_max_ok;
>      if (!checked_pix_max) {
>        checked_pix_max = true;
> +      static const char* fname = "/proc/sys/kernel/pid_max";

Why static?

@@ +303,3 @@
>          _exit(127);
>        }
> +      close(fd);

Good catch!

@@ +323,4 @@
>        _exit(127);
>      }
>    }
>    else if (privs == PRIVILEGES_VIDEO) {

I think that this will need to be rebased, since I don't believe that PRIVILEGES_VIDEO exists any more.

@@ +416,5 @@
> +    char fname[NAME_MAX + 12];
> +    snprintf(fname, sizeof(fname), "/proc/%s/stat", slot->d_name);
> +    FILE *fp = fopen(fname, "r");
> +    if (!fp) {
> +      gProcessLog.print("Error: fopen(): %s: %s", fname, strerror(errno));

I think that this should probably be considered as "normal" and not have a message logged. It would happen if the process existed when the readdir was done, but if it exited by the time we got around to trying to open /proc/PID/stat

@@ +421,1 @@
>        return false;

This also seems wrong to me (not your code I realize).

If the stat entry can't be read because some process happens to go away at the wrong time, then this function returns false, rather than continuing, which feels wrong to me, but maybe I'm missing something.

@@ +425,4 @@
>      const char* result = fgets(buf, sizeof(buf), fp);
> +    if (!result) {
> +      if (errno != 0) {
> +        gProcessLog.print("Error: fgets(): %s: %s", fname, strerror(errno));

I'm similarly concerned that this might also be a "normal" occurrence.

@@ +431,3 @@
>        return false;
> +    }
> +    fclose(fp);

Another good catch
Attachment #730264 - Flags: review?(dhylands)
Bent, would you like to be a reviewer of this one? Maybe you would have some further suggestions before I address dhylands' suggestions?
Flags: needinfo?(bent.mozilla)
I would suggest cjones if he is available. Otherwise I can review it.
Flags: needinfo?(bent.mozilla)
Attachment #730264 - Flags: review?(cjones.bugs)
Comment on attachment 730264 [details] [diff] [review]
Preliminary patch

>diff --git i/ipc/chromium/src/base/process_util_linux.cc w/ipc/chromium/src/base/process_util_linux.cc

>@@ -201,55 +202,65 @@ bool LaunchApp(const std::vector<std::string>& argv,

>+    gProcessLog.print("Error launching %s: while duplicating environment: %s",
>+                      argv[0].c_str(), strerror(errno));

We should be using the LOG() helper to access this.

Let's factor the errno/strerror(errno) pattern into a helper that adds
it to the message text automatically, in the style of perror().

>@@ -273,71 +284,79 @@ void SetCurrentProcessPrivileges(ChildPrivileges privs) {

>-      ssize_t len = read(fd, buf, sizeof(buf) - 1);
>-      close(fd);
>+      ssize_t len = HANDLE_EINTR(read(fd, buf, sizeof(buf) - 1));
>       if (len < 0) {
>-        DLOG(ERROR) << "Failed to read pid_max";
>+        gProcessLog.print("Error: read(): %s: %s", fname, strerror(errno));
>+        close(fd);

You may as well just drop the close().  We're always either going to
successfully exec() or _exit(), so the kernel will always clean it up.

Pretty straightforward.  Let's just clean things up a bit.
Attachment #730264 - Flags: review?(cjones.bugs)
Assignee: vd → nobody
I have no time to work on Mozilla bugs for now, should have unassigned this from me some time ago already to give chance to others to crack it.
What is the current status of this bug?
If possible, I would like to work on it. 
I am a newbie and just catching up...
Priority: -- → P5
No assignee, updating the status.
Status: ASSIGNED → NEW
No assignee, updating the status.
No assignee, updating the status.

Hi, can I be assigned this bug? If so, any tips to get started?

Sam

Actually, I don't think we need this anymore. Most of the code that cjones was talking about in comment #0 (and patched in comment #5) has been removed, and I don't believe we've been having problems with missing the details of errors in the remaining code.

(Or, looked at another way, the subtle failure cases in today's Linux process launching are generally in the sandboxing code, and that does log errors, including on non-debug builds.)

Status: NEW → RESOLVED
Closed: 11 years ago5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: