Closed Bug 1468511 Opened 6 years ago Closed 6 years ago

Enhanced xperf analysis

Categories

(Testing :: Talos, enhancement, P1)

enhancement

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: bugzilla, Assigned: bugzilla)

References

(Blocks 1 open bug)

Details

(Whiteboard: [PI:July])

Attachments

(1 file, 2 obsolete files)

I need to be able to enhance our xperf analysis to be able to track more events than the current analysis scripts can cover.

I have a prototype at https://github.com/dblohm7/xperf that provides expressions that may be used to extract attributes from events. I'd like to adapt that to both support the existing xperf use case of I/O monitoring, as well as the new stuff that I need for the bootstrap process.
Whiteboard: [PI:July]
Depends on: 1477870
Depends on: 1480870
Depends on: 1480911
No longer depends on: 1477870
Depends on: 1480912
Depends on: 1480913
:igoldan, is it possible that you could help :aklotz out with related talos issues as seen on this bug and dependent bugs?
Flags: needinfo?(igoldan)
Depends on: 1481672
as a note in bug 1439588, we are really close to having elevated privileges for running xperf again.  What we need is some taskcluster team help to add the scope and osgroup properly so this will magically work; I have been doing this manually on try with the edit/retrigger to verify it is working.  In doing that I landed code to fix imports as well as some etl/csv parsing.

After that is fixed (maybe a week), we still have the python3 work, if it is too unreasonable to port to python2, then we can review in a vidyo session next week to sort out a path forward.
Attached patch Add xperf_analyzer to xtalos (obsolete) — Splinter Review
Attachment #9000043 - Flags: review?(jmaher)
Comment on attachment 9000043 [details] [diff] [review]
Add xperf_analyzer to xtalos

Review of attachment 9000043 [details] [diff] [review]:
-----------------------------------------------------------------

overall this is a useful piece of code, can we replace much of our etlparser.py code with this?  I think we could replace much of it.  I have a few comments, please address them and I can review again.

::: testing/talos/talos/xtalos/xperf_analyzer.py
@@ +514,5 @@
> +    """
> +    def __init__(self, *events):
> +        super(EventSequence, self).__init__(list(events))
> +        if len(events) < 2:
> +            raise Exception("Why are you using this?")

can this be more descriptive?

@@ +627,5 @@
> +class SessionStoreWindowRestored(ClassicEvent):
> +    """ The Firefox session store window restored event """
> +    def __init__(self):
> +        super(SessionStoreWindowRestored, self).__init__(
> +            '{917B96B1-ECAD-4DAB-A760-8D49027748AE}')

this seems specific to firefox, is this defined in tree somewhere?

@@ +816,5 @@
> +            evt_name = 'FileIoWrite'
> +            self.strverb = 'Write'
> +        else:
> +            evt_name = 'FileIoRead'
> +            self.strverb = 'Read'

I don't understand why verb=Write and !verb=Read

@@ +892,5 @@
> +
> +        if self.etlfile:
> +            if kwargs['csvout']:
> +                self.csvout = os.path.abspath(kwargs['csvout'])
> +            self.etl2csv()

it would be nice to add a comment here indicating that etl2csv sets self.csvfile

@@ +906,5 @@
> +
> +        self.sess = XPerfSession()
> +
> +    def add_attr(self, attr):
> +        attr.set_session(self.sess)

what happens if self.sess = None?

@@ +913,5 @@
> +        """ Currently this class does not accept a path to xperf as an
> +        argument, but rather searches for xperf on the system PATH.
> +        """
> +        if self.xperf_path:
> +            return self.xperf_path

luckily in the new win7 images this is now in the system path we see in the env variables.  We do pass in the full path to xperf from talos and expect that in the xtalos framework, is there a reason for adding this functionality?  can we ensure that we pass in xperf from the cli or at least not ignore that?

@@ +933,5 @@
> +        (base, leaf) = os.path.split(user)
> +        merged = os.path.join(base, 'merged.etl')
> +
> +        xperf_cmd = [self.get_xperf_path(), '-merge', user, kernel, merged]
> +        subprocess.call(xperf_cmd)

I would like to print this command out if we are in debug mode

@@ +946,5 @@
> +            abs_csv_name = os.path.join(base, "{}.csv".format((leaf)))
> +
> +        xperf_cmd = [self.get_xperf_path(), '-i', self.etlfile, '-o',
> +                     abs_csv_name]
> +        subprocess.call(xperf_cmd)

print this command out if we are in debug mode

@@ +985,5 @@
> +        while True:
> +            try:
> +                row = csvdata.next()
> +            except StopIteration:
> +                break

the error condition here is we break and return from this function, which then self.data = ?.  Should we have a better error handling method here?

@@ +1058,5 @@
> +            help='Specify a path to save the interim csv file to disk')
> +        etls_parser.add_argument(
> +            '--keepcsv', required=False,
> +            help='Do not delete the interim csv file that was written to disk',
> +            action='store_true')

where do we use etls_parser?  can we remove it?

@@ +1068,5 @@
> +            help="Path to a .csv file generated by xperf")
> +        # We always imply --keepcsv when running in csv mode
> +        csv_parser.add_argument(
> +            '--keepcsv', required=False, help=argparse.SUPPRESS,
> +            action='store_true', default=True)

where do we use csv_parser?  can we remove this?

@@ +1106,5 @@
> +                BindThread(FileIOReadOrWrite(FileIOReadOrWrite.READ)))
> +            read_counter = XPerfCounter(browser_main_thread_file_io_read,
> +                                        output=structured_output,
> +                                        filters=myfilters)
> +            # etl.add_attr(read_counter)

why is this commented out?  should we just delete it, if we keep it in please add a comment.

@@ +1131,5 @@
> +
> +            etl.analyze()
> +
> +            # read_counter.process()
> +            # write_counter.process()

why are these commented out?  Future expansion?
Attachment #9000043 - Flags: review?(jmaher) → review-
I'll address other comments in the next revision of the patch, but I have a question about this:

(In reply to Joel Maher ( :jmaher ) (UTC+2) from comment #4)
> Comment on attachment 9000043 [details] [diff] [review]
> Add xperf_analyzer to xtalos
> 
> Review of attachment 9000043 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +985,5 @@
> > +        while True:
> > +            try:
> > +                row = csvdata.next()
> > +            except StopIteration:
> > +                break
> 
> the error condition here is we break and return from this function, which
> then self.data = ?.  Should we have a better error handling method here?
> 

This is the same behavior as the changes that you just made to filterOutHeader in etlparser for bug 1480870, no?
Flags: needinfo?(jmaher)
hmm, good point- possibly my code could be better- but we know it works as it is- possibly we leave it alone for now.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher ) (UTC+2) from comment #4)
> Comment on attachment 9000043 [details] [diff] [review]
> Add xperf_analyzer to xtalos
> 
> Review of attachment 9000043 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> overall this is a useful piece of code, can we replace much of our
> etlparser.py code with this?  I think we could replace much of it.

Yes, I think that ideally we would eventually want to use it for all of our xperf analyses. It does need to be expanded a bit more to be able to replace the existing etlparser functionality (in particular, network I/O), but that is pretty straightforward.

> ::: testing/talos/talos/xtalos/xperf_analyzer.py
> @@ +514,5 @@
> > +    """
> > +    def __init__(self, *events):
> > +        super(EventSequence, self).__init__(list(events))
> > +        if len(events) < 2:
> > +            raise Exception("Why are you using this?")
> 
> can this be more descriptive?

Fixed.

> 
> @@ +627,5 @@
> > +class SessionStoreWindowRestored(ClassicEvent):
> > +    """ The Firefox session store window restored event """
> > +    def __init__(self):
> > +        super(SessionStoreWindowRestored, self).__init__(
> > +            '{917B96B1-ECAD-4DAB-A760-8D49027748AE}')
> 
> this seems specific to firefox, is this defined in tree somewhere?

Yes. Defined at https://searchfox.org/mozilla-central/source/toolkit/components/startup/mozprofilerprobe.mof; this is also the same constant as CEVT_WINDOWS_RESTORED in etlparser.py.

> 
> @@ +816,5 @@
> > +            evt_name = 'FileIoWrite'
> > +            self.strverb = 'Write'
> > +        else:
> > +            evt_name = 'FileIoRead'
> > +            self.strverb = 'Read'
> 
> I don't understand why verb=Write and !verb=Read

Mainly laziness on my part! I agree that's confusing and I will replace with explicit checks for the constants.

> 
> @@ +892,5 @@
> > +
> > +        if self.etlfile:
> > +            if kwargs['csvout']:
> > +                self.csvout = os.path.abspath(kwargs['csvout'])
> > +            self.etl2csv()
> 
> it would be nice to add a comment here indicating that etl2csv sets
> self.csvfile

I changed etl2csv to return the file name instead, so that both assignments to self.csvfile occur in the same place.

> 
> @@ +906,5 @@
> > +
> > +        self.sess = XPerfSession()
> > +
> > +    def add_attr(self, attr):
> > +        attr.set_session(self.sess)
> 
> what happens if self.sess = None?

If that happens, then somebody has already suppressed an exception that was thrown by __init__.

> 
> @@ +913,5 @@
> > +        """ Currently this class does not accept a path to xperf as an
> > +        argument, but rather searches for xperf on the system PATH.
> > +        """
> > +        if self.xperf_path:
> > +            return self.xperf_path
> 
> luckily in the new win7 images this is now in the system path we see in the
> env variables.  We do pass in the full path to xperf from talos and expect
> that in the xtalos framework, is there a reason for adding this
> functionality?  can we ensure that we pass in xperf from the cli or at least
> not ignore that?

Sure, I can add that. This part was more for convenience when I was initially writing this and running it from the command line.

> 
> @@ +933,5 @@
> > +        (base, leaf) = os.path.split(user)
> > +        merged = os.path.join(base, 'merged.etl')
> > +
> > +        xperf_cmd = [self.get_xperf_path(), '-merge', user, kernel, merged]
> > +        subprocess.call(xperf_cmd)
> 
> I would like to print this command out if we are in debug mode

Fixed.

> 
> @@ +946,5 @@
> > +            abs_csv_name = os.path.join(base, "{}.csv".format((leaf)))
> > +
> > +        xperf_cmd = [self.get_xperf_path(), '-i', self.etlfile, '-o',
> > +                     abs_csv_name]
> > +        subprocess.call(xperf_cmd)
> 
> print this command out if we are in debug mode

Fixed.

 
> @@ +1058,5 @@
> > +            help='Specify a path to save the interim csv file to disk')
> > +        etls_parser.add_argument(
> > +            '--keepcsv', required=False,
> > +            help='Do not delete the interim csv file that was written to disk',
> > +            action='store_true')
> 
> where do we use etls_parser?  can we remove it?

It's useful when running xperf_analyzer as a standalone command for development and testing. I can remove main() etc, if you want...

> 
> @@ +1068,5 @@
> > +            help="Path to a .csv file generated by xperf")
> > +        # We always imply --keepcsv when running in csv mode
> > +        csv_parser.add_argument(
> > +            '--keepcsv', required=False, help=argparse.SUPPRESS,
> > +            action='store_true', default=True)
> 
> where do we use csv_parser?  can we remove this?

Again, it's useful when running xperf_analyzer as a standalone command.

> 
> @@ +1106,5 @@
> > +                BindThread(FileIOReadOrWrite(FileIOReadOrWrite.READ)))
> > +            read_counter = XPerfCounter(browser_main_thread_file_io_read,
> > +                                        output=structured_output,
> > +                                        filters=myfilters)
> > +            # etl.add_attr(read_counter)
> 
> why is this commented out?  should we just delete it, if we keep it in
> please add a comment.
> 
> @@ +1131,5 @@
> > +
> > +            etl.analyze()
> > +
> > +            # read_counter.process()
> > +            # write_counter.process()
> 
> why are these commented out?  Future expansion?

Yeah, mainly test code. I'll just remove these.
Attachment #9000043 - Attachment is obsolete: true
Attachment #9001398 - Flags: review?(jmaher)
Comment on attachment 9001398 [details] [diff] [review]
Add xperf_analyzer to xtalos (r2)

Review of attachment 9001398 [details] [diff] [review]:
-----------------------------------------------------------------

really close but a handful of nits related to exception messages/error handling.

::: testing/talos/talos/xtalos/xperf_analyzer.py
@@ +465,5 @@
> +        self.match_count = 0
> +
> +    def on_event_matched(self, evt):
> +        if evt != self.event:
> +            raise Exception("We are not wrapping this event")

nit: add more context to this exception, what is this event?

@@ +514,5 @@
> +    """
> +    def __init__(self, *events):
> +        super(EventSequence, self).__init__(list(events))
> +        if len(events) < 2:
> +            raise Exception('EventSequence requires at least two events')

nit: indicate how many events are seen, maybe what they are since it would be 0 or 1

@@ +522,5 @@
> +    def on_event_matched(self, evt):
> +        unseen_events = len(self.events) > 0
> +        if unseen_events and evt != self.events[0] or not unseen_events and \
> +           evt != self.seen_events[-1]:
> +            raise Exception("We are not executing this event")

what is this event?

@@ +572,5 @@
> +        self.tid = None
> +
> +    def on_event_matched(self, evt):
> +        if evt != self.event:
> +            raise Exception("We are not wrapping this event")

what is this event?  a bit more context here :)

@@ +627,5 @@
> +class SessionStoreWindowRestored(ClassicEvent):
> +    """ The Firefox session store window restored event """
> +    def __init__(self):
> +        super(SessionStoreWindowRestored, self).__init__(
> +            '{917B96B1-ECAD-4DAB-A760-8D49027748AE}')

can you comment here where you got this GUID from?  I would prefer if this was a CONST at the top of the file with a related comment so we could find it easier- open to keeping it here with better docs.

@@ +725,5 @@
> +        if not ThreadStart.process_index:
> +            ThreadStart.process_index = self.get_field_index(
> +                'Process Name ( PID)')
> +
> +        m = ThreadStart.pid_extractor.match(row[ThreadStart.process_index])

what if m doesn't match anything, will m.group(1) throw an exception?
Attachment #9001398 - Flags: review?(jmaher) → review-
Flags: needinfo?(igoldan)
(In reply to Joel Maher ( :jmaher ) (UTC+2) from comment #8)
> Comment on attachment 9001398 [details] [diff] [review]
> Add xperf_analyzer to xtalos (r2)
> 
> Review of attachment 9001398 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> really close but a handful of nits related to exception messages/error
> handling.
> 
> ::: testing/talos/talos/xtalos/xperf_analyzer.py
> @@ +465,5 @@
> > +        self.match_count = 0
> > +
> > +    def on_event_matched(self, evt):
> > +        if evt != self.event:
> > +            raise Exception("We are not wrapping this event")
> 
> nit: add more context to this exception, what is this event?

Sure, added more details.

> 
> @@ +514,5 @@
> > +    """
> > +    def __init__(self, *events):
> > +        super(EventSequence, self).__init__(list(events))
> > +        if len(events) < 2:
> > +            raise Exception('EventSequence requires at least two events')
> 
> nit: indicate how many events are seen, maybe what they are since it would
> be 0 or 1

Fixed

> 
> @@ +522,5 @@
> > +    def on_event_matched(self, evt):
> > +        unseen_events = len(self.events) > 0
> > +        if unseen_events and evt != self.events[0] or not unseen_events and \
> > +           evt != self.seen_events[-1]:
> > +            raise Exception("We are not executing this event")
> 
> what is this event?

Fixed

> 
> @@ +572,5 @@
> > +        self.tid = None
> > +
> > +    def on_event_matched(self, evt):
> > +        if evt != self.event:
> > +            raise Exception("We are not wrapping this event")
> 
> what is this event?  a bit more context here :)

Fixed

> 
> @@ +627,5 @@
> > +class SessionStoreWindowRestored(ClassicEvent):
> > +    """ The Firefox session store window restored event """
> > +    def __init__(self):
> > +        super(SessionStoreWindowRestored, self).__init__(
> > +            '{917B96B1-ECAD-4DAB-A760-8D49027748AE}')
> 
> can you comment here where you got this GUID from?  I would prefer if this
> was a CONST at the top of the file with a related comment so we could find
> it easier- open to keeping it here with better docs.

Sure, I made it a constant declared at the top, with a comment pointing to the
location of the MOF file.

> 
> @@ +725,5 @@
> > +        if not ThreadStart.process_index:
> > +            ThreadStart.process_index = self.get_field_index(
> > +                'Process Name ( PID)')
> > +
> > +        m = ThreadStart.pid_extractor.match(row[ThreadStart.process_index])
> 
> what if m doesn't match anything, will m.group(1) throw an exception?

It will return None. Subsequent code might raise when processing that None value, but honestly, if that happens it is a catastrophic failure that must be addressed. I can't think of a scenario where we can actually handle such a failure safely.
Attachment #9001398 - Attachment is obsolete: true
Attachment #9002076 - Flags: review?(jmaher)
Comment on attachment 9002076 [details] [diff] [review]
Add xperf_analyzer to xtalos (r3)

Review of attachment 9002076 [details] [diff] [review]:
-----------------------------------------------------------------

thanks, this is looking good
Attachment #9002076 - Flags: review?(jmaher) → review+
https://hg.mozilla.org/mozilla-central/rev/127d6ad1af54
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: