Closed Bug 1023371 Opened 8 years ago Closed 7 years ago

mozcrash should not directly print crash info to the console

Categories

(Testing :: Mozbase, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla34

People

(Reporter: jgraham, Assigned: jgraham)

References

Details

Attachments

(1 file, 5 obsolete files)

Or anything much really; it uses a mozlog logger for some informational messages and then prints directly to stdout.

It seems like a better design would be to hand back some information to the caller and let it deal with IO. That might mean fixing all the consumers, however, and there seem to be a number of these. So a possible interim solution would be to refactor into two functions, one with the current API which would call out to a new function that just returned the results as described. Removing the uses of mozlog might be harder; maybe it would be possible to pass in a logger object?
That's also something I tried to fix while improving mozcrash a couple weeks ago, but failed due to massive orangeness on TPBL. I thought I already filed a bug on it but doesn't look like.
OS: Linux → All
Hardware: x86_64 → All
Summary: mozcrash not compatible with structured logging → mozcrash should not directly print crash info to the console
The output of minidump_stackwalk is a wall of text. It'd be fine to add a new API that returns that output instead of printing it directly, and just call it from the existing API. I don't think you want to put that output into a structured log, it's kind of large.
First cut at this:
https://github.com/jgraham/gecko-dev/commit/f3f4f977aaddbd20afbd467f95282f6154f86047

Depends on wlach's changes in bug 1021943. Rather untested, by which I mean that it passes the unit tests for mozcrash, but that's all I tried.
Attached as a patch since the github diff viewer is so terrible. Not ready in any
sense but ready for comments on the approach.
Attachment #8438406 - Flags: feedback?(ted)
Attachment #8438406 - Flags: feedback?(cmanchester)
Comment on attachment 8438406 [details] [diff] [review]
First cut at making mozcrash work with structured logging

Review of attachment 8438406 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mozbase/mozcrash/mozcrash/mozcrash.py
@@ +22,5 @@
> +def get_logger():
> +    structured_logger = structuredlog.get_default_logger("mozcrash")
> +    if structured_logger is None:
> +        return mozlog.getLogger('mozcrash')
> +    return structured_logger

This makes me a bit anxious as a general pattern, because the return is a "logger-like" object that could conceivably cause weirdness if users deviate from the usual logging api. Probably rare in practice.

If we want to leave current callers alone, isn't this an opportunity to make use of http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/structured/stdadapter.py ?
Attachment #8438406 - Flags: feedback?(cmanchester) → feedback+
(In reply to Chris Manchester [:chmanchester] from comment #5)
> Comment on attachment 8438406 [details] [diff] [review]
> First cut at making mozcrash work with structured logging
> 
> Review of attachment 8438406 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: testing/mozbase/mozcrash/mozcrash/mozcrash.py
> @@ +22,5 @@
> > +def get_logger():
> > +    structured_logger = structuredlog.get_default_logger("mozcrash")
> > +    if structured_logger is None:
> > +        return mozlog.getLogger('mozcrash')
> > +    return structured_logger
> 
> This makes me a bit anxious as a general pattern, because the return is a
> "logger-like" object that could conceivably cause weirdness if users deviate
> from the usual logging api. Probably rare in practice.

So I don't think this is a big problem; in the cases where we are going to be introducing structured logging to a module, we control all the code. So if something is using the logging API in a weird way we will have to fix it to not do that. But we will be changing the code anyway, so that's not a big deal.
 
> If we want to leave current callers alone, isn't this an opportunity to make
> use of
> http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/
> structured/stdadapter.py ?

That code is really only intended for third-party libraries that we don't control at all. It's a bit of a hack and it makes me quite nervous. More nervous than relying on duck typing in code that we do control.
This makes it compatible with structured logging.
Attachment #8439904 - Flags: review?(ted)
Attachment #8438406 - Attachment is obsolete: true
Attachment #8438406 - Flags: feedback?(ted)
This makes it compatible with structured logging.
Attachment #8439904 - Attachment is obsolete: true
Attachment #8439904 - Flags: review?(ted)
Attachment #8440789 - Flags: review?(ted)
https://tbpl.mozilla.org/?tree=Try&rev=7f5bc165b9f5

Second try run of XPCShell tests. This seems to be OK on Linux and OSX but some tests are failing on Windows.
Seems to have been an unrelated problem, as a retry on newer m-c worked

https://tbpl.mozilla.org/?tree=Try&rev=0d7c0cd3ad61
Blocks: 945222
Blocks: 1034272
Comment on attachment 8440789 [details] [diff] [review]
Refactor mozcrash to seperate out the IO from the stack inspection.

Review of attachment 8440789 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mozbase/mozcrash/mozcrash/mozcrash.py
@@ +89,2 @@
>                  print '\n'.join(stackwalk_output)
> +                print '\n'.join(dump["errors"])

I don't think I understand how this is intended to be used. We still print to the console, but we'll also invoke the crash action in the logger with the value generated by the CrashInfo object?
Comment on attachment 8440789 [details] [diff] [review]
Refactor mozcrash to seperate out the IO from the stack inspection.

Review of attachment 8440789 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mozbase/mozcrash/mozcrash/mozcrash.py
@@ +89,2 @@
>                  print '\n'.join(stackwalk_output)
> +                print '\n'.join(dump["errors"])

No, the idea was that this function is for legacy-compat only. For structured logging one uses the CrashInfo object directly.

It's possible that we need a helper function that handles the logging for the structured case too, however.
Comment on attachment 8440789 [details] [diff] [review]
Refactor mozcrash to seperate out the IO from the stack inspection.

Review of attachment 8440789 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I'm not sure why this kept falling out of view.

So a few general comments:
* I don't think the context manager bit is really necessary. The MINIDUMP_SAVE_PATH logic should all be internal to this method so that we don't have to deal with it in every caller. Given that, you should be able to do all the processing and just return an array of information about processed dumps (what you're currently returning in the __iter__). Then callers can simply iterate over the results and log them as desired.
* Would we rather just allow callers to pass in a structured logger and have all the results logged there instead of via plain mozlog? It seems silly to burden callers with knowing how to deal with crash info, we could keep the current model where they simply get a boolean return from check_for_crashes which says "there were crashes that I processed". The more we can consolidate the logging, the more likely it is that any harness using mozcrash does the right thing in terms of TBPL formatting etc.

::: testing/mozbase/mozlog/mozlog/structured/formatters/machformatter.py
@@ +89,5 @@
> +        else:
> +            success = False
> +
> +        return ("Process crashed pid:%s. Test:%s. Minidump anaylsed:%s. Top frame:[%s] " %
> +                (data.get("pid", None), test, success, data["top_frame"]))

I don't really get why our mach formatter and tbpl formatter are different. Are we doing anything useful with this distinction?

::: testing/mozbase/mozlog/mozlog/structured/structuredlog.py
@@ +239,5 @@
> +            data["stackwalk_stdout"] = stackwalk_stdout
> +        if stackwalk_stderr is not None:
> +            data["stackwalk_stderr"] = stackwalk_stderr
> +        if errors is not None:
> +            data["stackwalk_errors"] = errors

Boy, this is tedious to write, isn't it?
Attachment #8440789 - Flags: review?(ted) → review-
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #14)
> Comment on attachment 8440789 [details] [diff] [review]
> Refactor mozcrash to seperate out the IO from the stack inspection.
> 
> Review of attachment 8440789 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Sorry, I'm not sure why this kept falling out of view.
> 
> So a few general comments:
> * I don't think the context manager bit is really necessary. The
> MINIDUMP_SAVE_PATH logic should all be internal to this method so that we
> don't have to deal with it in every caller. Given that, you should be able
> to do all the processing and just return an array of information about
> processed dumps (what you're currently returning in the __iter__). Then
> callers can simply iterate over the results and log them as desired.

I can see that the context manager is unnecessary, but it seems like a reasonable idea to keep the iterator so that the data can be processed lazilly rather than forcing it all to be processed up-front. It's pretty trivial to add a list() around the result if that's what you want.

> * Would we rather just allow callers to pass in a structured logger and have
> all the results logged there instead of via plain mozlog? It seems silly to
> burden callers with knowing how to deal with crash info, we could keep the
> current model where they simply get a boolean return from check_for_crashes
> which says "there were crashes that I processed". The more we can
> consolidate the logging, the more likely it is that any harness using
> mozcrash does the right thing in terms of TBPL formatting etc.

I agree that having a convenience function where you pass in a logger makes some sense. But I think structured logging makes it pretty hard to get wrong; you just log each crash as a crash and it gets automatically formatted in the right way.

> ::: testing/mozbase/mozlog/mozlog/structured/formatters/machformatter.py
> @@ +89,5 @@
> > +        else:
> > +            success = False
> > +
> > +        return ("Process crashed pid:%s. Test:%s. Minidump anaylsed:%s. Top frame:[%s] " %
> > +                (data.get("pid", None), test, success, data["top_frame"]))
> 
> I don't really get why our mach formatter and tbpl formatter are different.
> Are we doing anything useful with this distinction?

In general or in this specific case? In general the mach formatter is designed to be more concise and human-readable whereas the tbpl formatter has to conform to the requirements imposed on us by the tbpl regexp.
 
> ::: testing/mozbase/mozlog/mozlog/structured/structuredlog.py
> @@ +239,5 @@
> > +            data["stackwalk_stdout"] = stackwalk_stdout
> > +        if stackwalk_stderr is not None:
> > +            data["stackwalk_stderr"] = stackwalk_stderr
> > +        if errors is not None:
> > +            data["stackwalk_errors"] = errors
> 
> Boy, this is tedious to write, isn't it?

Yeah, but the reason for all the prefixes is that this data seems rather specific to the stackwalker binary. It seems quite conceivable that we'll want different output in the future and having more specific field names will help.
(In reply to James Graham [:jgraham] from comment #15)
> I agree that having a convenience function where you pass in a logger makes
> some sense. But I think structured logging makes it pretty hard to get
> wrong; you just log each crash as a crash and it gets automatically
> formatted in the right way.

Sure, but having to write:

with mozcrash.CrashInfo() as crash:
  for crash in crash.dumps:
    log.crash(crash)

is still more verbose than:
mozcrash.check_for_crashes()


and every additional method we require harnesses to call is one more possible bug in reporting.
Updated somewhat, but not fully tested yet. Is this basic design OK?
Attachment #8440789 - Attachment is obsolete: true
Attachment #8468499 - Flags: feedback?(ted)
This makes it compatible with structured logging.
Attachment #8468499 - Attachment is obsolete: true
Attachment #8468499 - Flags: feedback?(ted)
Attachment #8469202 - Flags: review?(ted)
Comment on attachment 8469202 [details] [diff] [review]
Refactor mozcrash to seperate out the IO from the stack inspection.

Review of attachment 8469202 [details] [diff] [review]:
-----------------------------------------------------------------

Apart from the pointless "line too long" warning, prior to this patch mozcrash.py had zero pyflakes warnings. With it, I get:

mozcrash.py:22:1: E302 expected 2 blank lines, found 1
mozcrash.py:28:1: E302 expected 2 blank lines, found 1
mozcrash.py:155:29: E221 multiple spaces before operator
mozcrash.py:198:13: E129 visually indented line with same indent as next logical line
mozcrash.py:239:30: E231 missing whitespace after ':'
mozcrash.py:240:26: E231 missing whitespace after ':'
mozcrash.py:241:33: E231 missing whitespace after ':'
mozcrash.py:242:33: E231 missing whitespace after ':'
mozcrash.py:243:34: E231 missing whitespace after ':'
mozcrash.py:244:33: E231 missing whitespace after ':'
mozcrash.py:267:1: E302 expected 2 blank lines, found 1

::: testing/mozbase/mozcrash/mozcrash/mozcrash.py
@@ +92,5 @@
>  
>  
> +def log_crashes(logger, dump_directory, symbols_path, process=None, test=None,
> +                stackwalk_binary=None, dump_save_path=None):
> +    """Log crahses using a structured logger"""

s/crahses/crashes/
Fixed a small number of whitespace errors.
Attachment #8469202 - Attachment is obsolete: true
Attachment #8469202 - Flags: review?(ted)
Attachment #8469231 - Flags: review?(ted)
Comment on attachment 8469231 [details] [diff] [review]
Refactor mozcrash to seperate out the IO from the stack inspection.

Review of attachment 8469231 [details] [diff] [review]:
-----------------------------------------------------------------

I have a bunch of comments (most of them nitpicky stuff), but overall this looks good. If you want to just fix these and land with r=me that's fine, if you want a re-review that's fine as well.

::: testing/mozbase/mozcrash/mozcrash/mozcrash.py
@@ +75,2 @@
>  
> +    for dump in crash_info:

I would call this variable "info" instead of "dump", as dump has a specific meaning in this code (minidump file produced by the application).

@@ +93,5 @@
>      return True
>  
>  
> +def log_crashes(logger, dump_directory, symbols_path, process=None, test=None,
> +                stackwalk_binary=None, dump_save_path=None):

I don't think there's anything in the Python style guide about formatting function definitions like this, but when they get this many parameters I tend to prefer one-per-line.

@@ +94,5 @@
>  
>  
> +def log_crashes(logger, dump_directory, symbols_path, process=None, test=None,
> +                stackwalk_binary=None, dump_save_path=None):
> +    """Log crahses using a structured logger"""

typo: crashes

@@ +95,5 @@
>  
> +def log_crashes(logger, dump_directory, symbols_path, process=None, test=None,
> +                stackwalk_binary=None, dump_save_path=None):
> +    """Log crahses using a structured logger"""
> +    for dump in CrashInfo(dump_directory, symbols_path, dump_save_path=dump_save_path,

info for the variable name here as well.

@@ +101,5 @@
> +        logger.crash(process=process, test=test, **dump)
> +
> +
> +class CrashInfo(object):
> +    """Get infromation about a crash based on dump files.

typo: information

@@ +103,5 @@
> +
> +class CrashInfo(object):
> +    """Get infromation about a crash based on dump files.
> +
> +    Typical usage is to iterate over the CrashInfo object.

You might describe what iterating over the object gives you.

@@ +169,5 @@
> +            rv = self._process_dump_file(path, extra)
> +            yield rv
> +
> +        if self.remove_symbols:
> +            mozfile.remove(self.symbols_path)

This feels like a weird place to put this cleanup. I don't have a great alternate suggestion except to use a __del__ method. (You don't need to change this unless you have a better idea.)

@@ +176,5 @@
> +        """Process a single dump file using self.stackwalk_binary, and return a
> +        dictionary containing properties of the crash dump.
> +
> +        :param path: Path to the minidump file to analyse
> +        :return: A dictionary of the form::

I think using a namedtuple subclass for the return value would be nicer.

@@ +179,5 @@
> +        :param path: Path to the minidump file to analyse
> +        :return: A dictionary of the form::
> +                   minidump_path: Path of the dump file
> +                   top_frame: The top frame of the stack trace, or None if it
> +                              could not be determined.

Can we rename this to 'signature'? That's what crash-stats uses for this term. (I know it's called top_frame in the old code.)

@@ +183,5 @@
> +                              could not be determined.
> +                   stackwalk_stdout: String of stdout data from stackwalk
> +                   stackwalk_stderr: String of stderr data from stackwalk or
> +                                     None if it succeeded
> +                   stackwalk_retcode: Retun code from stackwalk

typo: Return

::: testing/mozbase/mozlog/mozlog/structured/structuredlog.py
@@ +9,5 @@
>  import json
>  import time
>  
> +from logtypes import Unicode, TestId, Status, SubStatus, Dict, List, Int
> +from logtypes import log_action, convertor_registry

I don't know what the actual Python convention is for multiple imports like this, but I like the one we're using elsewhere of using parens and one import per line like:
from logtypes import (
    Unicode,
    TestId,
...
)
Attachment #8469231 - Flags: review?(ted) → review+
Assignee: nobody → james
Addressed the review comments and pushed as

https://hg.mozilla.org/integration/mozilla-inbound/rev/64717933c727
https://hg.mozilla.org/mozilla-central/rev/64717933c727
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
You need to log in before you can comment on or make changes to this bug.