Closed Bug 998794 Opened 5 years ago Closed 5 years ago

Main thread I/O log integration into Talos

Categories

(Testing :: Talos, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aklotz, Assigned: jmaher)

References

Details

(Keywords: ateam-desktop-perf-goal)

Attachments

(6 files, 6 obsolete files)

10.20 KB, patch
aklotz
: review+
Details | Diff | Splinter Review
12.33 KB, patch
dminor
: review+
Details | Diff | Splinter Review
6.69 KB, patch
dminor
: review+
Details | Diff | Splinter Review
29.65 KB, application/json
Details
18.66 KB, patch
wlach
: review+
Details | Diff | Splinter Review
2.63 KB, patch
wlach
: review+
Details | Diff | Splinter Review
With bug 935482 landing we now have the ability to generate logs of main thread I/O that can be used during testing to flag I/O.

To enable the logging, set the MOZ_MAIN_THREAD_IO_LOG environment variable to a path pointing to the desired location of the log file.

Each line is comma-separated, but there are two different formats:
(1) Start Timestamp (ms), Operation, Duration (ms), Event Source, Filename;
(2) Start Tiemstamp (ms), NEXT-STAGE

Format (2) is used to signify that the browser has transitioned to the next stage of execution (startup to normal, normal to shutdown).

There will be additional formats once call stack logging has been implemented.
next steps:
* aklotz to write a script that will parse the io logs and determine pass/fail
* jmaher to integrate that script into talos
* jmaher to run on all platforms/tests via try
* aklotz/jmaher - discuss which platforms/tests we want to run on
* party
The whitelist code will need some additional enhancement once we have a better idea about what criteria we need. Right now it just supports ignoring based on a boolean.
Attachment #8430885 - Flags: review?(jmaher)
Comment on attachment 8430885 [details] [diff] [review]
Add main thread i/o parser and whitelisting class to Talos

Review of attachment 8430885 [details] [diff] [review]:
-----------------------------------------------------------------

a lot of little details here, lets take another round.

::: talos/mainthreadio.py
@@ +71,5 @@
> +    return True
> +
> +def main(argv):
> +    if len(argv) < 3:
> +        print "Usage: %s <main_thread_io_log_file> <xre_path>" % argv[0]

we should do some validation on the args

@@ +82,5 @@
> +                             paths = {"{xre}": argv[2]},
> +                             path_substitutions = PATH_SUBSTITUTIONS,
> +                             name_substitutions = NAME_SUBSTITUTIONS)
> +    try:
> +        wl.load('mtio-whitelist.json')

do we have this file defined somewhere?

::: talos/whitelist.py
@@ +16,5 @@
> +class whitelist:
> +    def __init__(self, test_name, paths, path_substitutions, name_substitutions, init_with=None):
> +        self.test_name = test_name
> +        self.listmap = init_with if init_with else {}
> +        self.dependent_libs = self.load_dependent_libs() if init_with and KEY_XRE in paths else {}

why are dependent_libs separate from listmap?

@@ +24,5 @@
> +
> +    def load(self, filename):
> +        self.load_dependent_libs()
> +        with open(filename, 'r') as f:
> +            self.listmap = json.load(f)

we never check if the filename is valid

@@ +48,5 @@
> +        for old_name, new_name in self.name_substitutions.iteritems():
> +            pathname = "\\%s" % old_name
> +            parts = filename.split(pathname)
> +            if len(parts) >= 2:
> +                filename = "%s\\{time}" % (parts[0], new_name)

why is this hardcoded to time?

@@ +61,5 @@
> +            if filename in self.listmap:
> +                if 'ignore' in self.listmap[filename] and self.listmap[filename]['ignore']:
> +                    continue
> +            elif filename in self.dependent_libs:
> +                continue

if we put all dependent_libs in listmap, that might work

@@ +91,5 @@
> +    return 0
> +
> +if __name__ == "__main__":
> +    import sys
> +    sys.exit(main(sys.argv))

we don't need this stuff, just leave out lines 90-95
Attachment #8430885 - Flags: review?(jmaher) → review-
(In reply to Joel Maher (:jmaher) from comment #3)
> Comment on attachment 8430885 [details] [diff] [review]
> Add main thread i/o parser and whitelisting class to Talos
> 
> Review of attachment 8430885 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> a lot of little details here, lets take another round.
> 
> ::: talos/mainthreadio.py
> @@ +71,5 @@
> > +    return True
> > +
> > +def main(argv):
> > +    if len(argv) < 3:
> > +        print "Usage: %s <main_thread_io_log_file> <xre_path>" % argv[0]
> 
> we should do some validation on the args

I added some existence checks in main.

> 
> @@ +82,5 @@
> > +                             paths = {"{xre}": argv[2]},
> > +                             path_substitutions = PATH_SUBSTITUTIONS,
> > +                             name_substitutions = NAME_SUBSTITUTIONS)
> > +    try:
> > +        wl.load('mtio-whitelist.json')
> 
> do we have this file defined somewhere?

I'll add an empty list. We don't really know whether anything needs to be whitelisted yet.

> 
> ::: talos/whitelist.py
> @@ +16,5 @@
> > +class whitelist:
> > +    def __init__(self, test_name, paths, path_substitutions, name_substitutions, init_with=None):
> > +        self.test_name = test_name
> > +        self.listmap = init_with if init_with else {}
> > +        self.dependent_libs = self.load_dependent_libs() if init_with and KEY_XRE in paths else {}
> 
> why are dependent_libs separate from listmap?

The intention here was that this class could be used not just to load whitelists, but also to generate them based on the output data from a test run. The idea here was to keep the dependent libs separate so that we could just save the whitelist without needing to filter out the dependent libs. Thoughts?

> 
> @@ +24,5 @@
> > +
> > +    def load(self, filename):
> > +        self.load_dependent_libs()
> > +        with open(filename, 'r') as f:
> > +            self.listmap = json.load(f)
> 
> we never check if the filename is valid

Putting my performance hat on for a moment, I prefer to avoid the "check for existence, then open" idiom and instead go straight for the open. If the file does open successfully, then it's one file syscall instead of two, and if the file can't be opened for any reason, we'll handle the error anyway; My preference is to add handlers for any IOError exceptions. Thoughts?

> 
> @@ +48,5 @@
> > +        for old_name, new_name in self.name_substitutions.iteritems():
> > +            pathname = "\\%s" % old_name
> > +            parts = filename.split(pathname)
> > +            if len(parts) >= 2:
> > +                filename = "%s\\{time}" % (parts[0], new_name)
> 
> why is this hardcoded to time?

It shouldn't have been. Thanks for catching that.

> 
> @@ +61,5 @@
> > +            if filename in self.listmap:
> > +                if 'ignore' in self.listmap[filename] and self.listmap[filename]['ignore']:
> > +                    continue
> > +            elif filename in self.dependent_libs:
> > +                continue
> 
> if we put all dependent_libs in listmap, that might work

Certainly could -- see my remarks above and let me know what you prefer.

> 
> @@ +91,5 @@
> > +    return 0
> > +
> > +if __name__ == "__main__":
> > +    import sys
> > +    sys.exit(main(sys.argv))
> 
> we don't need this stuff, just leave out lines 90-95

Whoops, that was some vestigial testing code that I forgot to remove. Wilco.
Flags: needinfo?(jmaher)
this all sounds good.  Regarding dependentlibs, please add a comment so the intention is understood in the future when we add a feature and refactor on accident.

On the other subject of checks for files then open, I agree a single IOError catch is faster and just as readable.  My biggest concern with try/except handling is we sometimes do 10 operations inside a single block and then spit out a 'oh no an error happened' message which doesn't really help pinpoint the problem.  In this case I would like to know if the file is missing or if the file is invalid- the details can be left up to you :)
Flags: needinfo?(jmaher)
Attachment #8430885 - Attachment is obsolete: true
Attachment #8432997 - Flags: review?(jmaher)
Comment on attachment 8432997 [details] [diff] [review]
Add main thread i/o parser and whitelisting class to Talos (rev. 2)

Review of attachment 8432997 [details] [diff] [review]:
-----------------------------------------------------------------

this seems a good.  How are you envisioning integrating this in with talos?  Would we just run this as a cleanup script after tp5?

::: talos/mainthreadio.py
@@ +35,5 @@
> +LEAKED_SYMLINK_PREFIX = "::\\{"
> +
> +PATH_SUBSTITUTIONS = {'profile': '{profile}', 'firefox': '{xre}',
> +                      'desktop': '{desktop}', 'talos': '{talos}'}
> +NAME_SUBSTITUTIONS = {'installtime': '{time}', 'prefetch': '{prefetch}'}

prefetch can be Prefetch or prefetch, we account for that in xperf:
http://hg.mozilla.org/build/talos/file/1cc819c9de27/talos/xtalos/etlparser.py#l340

::: talos/whitelist.py
@@ +28,5 @@
> +            with open(filename, 'r') as f:
> +                self.listmap = json.load(f)
> +        except IOError as e:
> +            print "%s: %s" % (e.filename, e.strerror)
> +            return

can we return false in the two error conditions in this function?

@@ +109,5 @@
> +            self.dependent_libs = {"%s\\%s" % (KEY_XRE, lib.strip()): {'ignore': True} for lib in libs}
> +            return True
> +        except IOError as e:
> +            print "%s: %s" % (e.filename, e.strerror)
> +            return

can we return False here, I don't want to assume 'return' doesn't evaulate to True
Attachment #8432997 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #7)
> Comment on attachment 8432997 [details] [diff] [review]
> Add main thread i/o parser and whitelisting class to Talos (rev. 2)
> 
> Review of attachment 8432997 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> this seems a good.  How are you envisioning integrating this in with talos? 
> Would we just run this as a cleanup script after tp5?

Yes. We'd need to set the MOZ_MAIN_THREAD_IO_LOG environment variable before running tp5, then we run this as a cleanup script.

> 
> ::: talos/mainthreadio.py
> @@ +35,5 @@
> > +LEAKED_SYMLINK_PREFIX = "::\\{"
> > +
> > +PATH_SUBSTITUTIONS = {'profile': '{profile}', 'firefox': '{xre}',
> > +                      'desktop': '{desktop}', 'talos': '{talos}'}
> > +NAME_SUBSTITUTIONS = {'installtime': '{time}', 'prefetch': '{prefetch}'}
> 
> prefetch can be Prefetch or prefetch, we account for that in xperf:
> http://hg.mozilla.org/build/talos/file/1cc819c9de27/talos/xtalos/etlparser.
> py#l340

Unless I'm missing something, that should be handled by lowercasing the input filename.

> 
> ::: talos/whitelist.py
> @@ +28,5 @@
> > +            with open(filename, 'r') as f:
> > +                self.listmap = json.load(f)
> > +        except IOError as e:
> > +            print "%s: %s" % (e.filename, e.strerror)
> > +            return
> 
> can we return false in the two error conditions in this function?

Done.

> 
> @@ +109,5 @@
> > +            self.dependent_libs = {"%s\\%s" % (KEY_XRE, lib.strip()): {'ignore': True} for lib in libs}
> > +            return True
> > +        except IOError as e:
> > +            print "%s: %s" % (e.filename, e.strerror)
> > +            return
> 
> can we return False here, I don't want to assume 'return' doesn't evaulate
> to True

Done.


(Carrying forward r+)
Attachment #8432997 - Attachment is obsolete: true
Attachment #8437196 - Flags: review+
Depends on: 1028999
Assignee: nobody → aklotz
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
this has 2 hacks left before we are ready to get review.

One of the hacks is that we are running this in xperf and it changes our file access so we need to adjust the xperf whitelist to account for this.  I should have this ready for review tomorrow.
Assignee: aklotz → jmaher
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
while this is not a perfect solution, it isn't too hacky either.  Open to questions/concerns.
Attachment #8456427 - Attachment is obsolete: true
Attachment #8457977 - Flags: review?(dminor)
Comment on attachment 8457977 [details] [diff] [review]
integration mainthreadio data into xperf run (1.0)

Review of attachment 8457977 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, looks good!

::: talos/cmanager_win32.py
@@ +140,5 @@
>        if counter.strip() == 'Main_RSS':
>          continue
> +
> +      if counter.strip() == 'mainthread_io':
> +        continue

Please add a comment explaining why we skip these here.

::: talos/output.py
@@ +292,5 @@
>              for i, (val, page) in enumerate(vals):
> +                try:
> +                    buffer.write("%d,%.2f,%s\n" % (i,float(val), page))
> +                except:
> +                    pass

Could you catch a specific exception here and/or log something?

::: talos/results.py
@@ +512,5 @@
> +        if not os.path.exists(filename):
> +            # silent failure is fine here as we will only see this on tp5n runs
> +            return
> +
> +        contents = file(filename).read()

I think a try/except here would be better than the os.path.exists above.
Attachment #8457977 - Flags: review?(dminor) → review+
thanks for the review.  I have addressed your concerns in the patch I landed:
https://hg.mozilla.org/build/talos/rev/c36273c3498d
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Blocks: 1044207
this isn't working as we though.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8469621 [details] [diff] [review]
updates to test definitions to support mainthread io (1.0)

Review of attachment 8469621 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.

::: talos/test.py
@@ +436,5 @@
>  
>      tpscrolltest = True
>      """ASAP mode"""
>      tpmozafterpaint = False
> +    preferences = {'layout.frame_rate': 0, 

nit: extra whitespace

@@ +647,5 @@
>      tpmanifest = '${talos}/page_load_test/a11y/a11y.manifest'
>      tpcycles = 1
>      tppagecycles = 25
>      tpmozafterpaint = True
> +    preferences = {'dom.send_after_paint_to_content': False}

We do this a lot. Maybe we could have some default preferences in PageloaderTest to handle this?
Attachment #8469621 - Flags: review?(dminor) → review+
filed a followup bug 1050740 to address the bloat of preferences.
Depends on: 1050824
a list of what we discovered after running this for a week.  Next week we will see if our results map to the same files and accesses.
using our seed data, we now have a profile of what to expect.  I have created a mtio-whitelist.json with expected files and the option for extending the duration allotment >1.0ms.

tested on try:
https://tbpl.mozilla.org/?tree=Try&rev=748c5f4280fb (the last 2 x's, intentional failing on the last one)
Attachment #8495237 - Flags: review?(wlachance)
Attachment #8495237 - Flags: review?(aklotz)
Comment on attachment 8495237 [details] [diff] [review]
mainthread io support for files, duration (1.0)

Review of attachment 8495237 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly looks fine, I have some issues with the error output suppression in TalosProcess.

::: talos/TalosProcess.py
@@ +74,5 @@
>          if line.find('__endTimestamp') != -1:
>              thread = Thread(target=self.waitForQuit)
>              thread.setDaemon(True) # don't hang on quit
>              thread.start()
> +        line = line.replace('rror', 'rr0r')

I'm not actually sure what this is for -- it looks like you're cancelling out the use of TalosProcess elsewhere in the patch. I guess there are other mainthread consumers still that are producing this?

I think the error output suppression would happen at a higher level. Otherwise people might accidentally use TalosProcess for stuff and unknowing suppress actually errors.

::: talos/ttest.py
@@ +495,5 @@
>                  if not os.path.isfile(browser_log_filename):
>                      raise TalosError("no output from browser [%s]" % browser_log_filename)
>  
>                  # check for xperf errors
> +                if os.path.exists(browser_config['error_filename']) or mainthread_error_count > 0:

Split this clause onto multiple lines.

::: talos/whitelist.py
@@ +25,1 @@
>  class whitelist:

Not part of this patch but classnames should always be capitalized.

@@ +38,5 @@
>          try:
>              with open(filename, 'r') as f:
> +                temp = json.load(f)
> +
> +            for f in temp:

Using 'f' for this variable is confusing. Can we use something more descriptive, like whitelist_element ?

@@ +55,5 @@
>              json.dump(self.listmap, f, sort_keys=True, indent=4, separators=(',', ': '))
>          print "Dependent libs: %r" % self.dependent_libs
>  
>      def sanitize_filename(self, filename):
> +        global PRE_PROFILE

If you made PRE_PROFILE part of this class you wouldn't have to make it a global. You could just refer to it as self.PRE_PROFILE.

@@ +74,5 @@
> +                filename = "%s%s" % (parts[0], new_name)
> +
> +        filename = filename.strip()
> +        filename = filename.strip('/')
> +        filename = filename.strip('\\')

You can collapse this to (at least):

filename = filename.strip('/\\\ \t')

>>> '\t cheezburger\\/'.strip('/\\\ \t')
'cheezburger'

https://docs.python.org/2/library/stdtypes.html#str.strip
Attachment #8495237 - Flags: review?(wlachance) → review-
Comment on attachment 8495237 [details] [diff] [review]
mainthread io support for files, duration (1.0)

Review of attachment 8495237 [details] [diff] [review]:
-----------------------------------------------------------------

OK by me as far as the bits that I know about.
Attachment #8495237 - Flags: review?(aklotz) → review+
example javascript suppression:
https://tbpl.mozilla.org/php/getParsedLog.php?id=48892132&tree=Try&full=1

addressed all other concerns from wlach.

aklotz, thanks for the review on the mtio logistics!
Attachment #8495237 - Attachment is obsolete: true
Attachment #8495491 - Flags: review?(wlachance)
Comment on attachment 8495491 [details] [diff] [review]
mainthread io support for files, duration (1.1)

Review of attachment 8495491 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good now!

::: talos/whitelist.py
@@ +38,2 @@
>          try:
> +            with open(filename, 'r') as fHandle:

To be clear earlier, I think 'f' is a fine name for a file variable, just not for the whitelist_name variable below.

@@ +76,1 @@
>          return filename

Could simplify this to just:

return filename.strip('/\\\ \t')
Attachment #8495491 - Flags: review?(wlachance) → review+
https://hg.mozilla.org/build/talos/rev/a8ce24a66add
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Blocks: 1076990
Duplicate of this bug: 1078409
oh and more hacks, but this time I have more confidence in this thanks to many green try runs:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=19866c9fc1d6
Attachment #8501697 - Flags: review?(wlachance)
Attachment #8501697 - Flags: review?(aklotz)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8501697 [details] [diff] [review]
fix the pre profile path to support shortened and regular paths (1.0)

Review of attachment 8501697 [details] [diff] [review]:
-----------------------------------------------------------------

OK from my angle but see comment.

::: talos/whitelist.py
@@ +94,5 @@
>              else:
>                  if filename not in errors:
>                      errors[filename] = []
>                  errors[filename].append(test[row_key])
> +                print "JMAHER: got an error on file '%s', pre_profile: %s" % (filename, self.PRE_PROFILE)

Is this intentional? (That's fine if it is, just making sure!)
Attachment #8501697 - Flags: review?(aklotz) → review+
Thanks Aaron, I removed my debugging comment and will look forward to what Will has to say.
Attachment #8501697 - Attachment is obsolete: true
Attachment #8501697 - Flags: review?(wlachance)
Attachment #8501815 - Flags: review?(wlachance)
Comment on attachment 8501815 [details] [diff] [review]
fix the pre profile path to support shortened and regular paths (1.1)

Review of attachment 8501815 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm. Some things could be a bit more clear.

::: talos/whitelist.py
@@ +64,5 @@
>              if len(parts) >= 2:
>                  if self.PRE_PROFILE == '' and subst == '{profile}':
> +                    fname = self.sanitize_filename(parts[0])
> +                    self.listmap[fname] = {}
> +                    # Windows can have {appdata}\local\temp\longnamedfolder or {appdata}\local\temp\longna~1

Maybe add a comment in about how we're going to munge longnamedfolder -> longna~1

It wasn't immediately obvious to me that that was what you were doing from first glance at this patch.

@@ +65,5 @@
>                  if self.PRE_PROFILE == '' and subst == '{profile}':
> +                    fname = self.sanitize_filename(parts[0])
> +                    self.listmap[fname] = {}
> +                    # Windows can have {appdata}\local\temp\longnamedfolder or {appdata}\local\temp\longna~1
> +                    if not fname.endswith('~1'):

I think it would be slightly more clear if you reversed these:

if fname.endswith('~1'):
  ...
else:
  ...

@@ +133,5 @@
>          return error_strs
>  
>      def print_errors(self, error_strs):
>          for error_msg in error_strs:
> +            print "TEST-UNEXPECTED-FAIL | %s | %s" % (self.test_name, error_msg)

What does changing this buy us?
Attachment #8501815 - Flags: review?(wlachance) → review+
I know you gave a r+ previously, I looked at your feedback and worked towards doing that, I ended up making a few extra changes in that same block.
Attachment #8501815 - Attachment is obsolete: true
Attachment #8501976 - Flags: review?(wlachance)
Attachment #8501976 - Flags: review?(wlachance) → review+
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.