Closed Bug 1423353 Opened 2 years ago Closed 2 years ago

talos has intermittent xperf file access- lets fix this

Categories

(Testing :: Talos, defect)

defect
Not set

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: jmaher, Assigned: igoldan)

References

Details

(Whiteboard: [PI:January])

Attachments

(1 file)

{profile}\extensions\talos-powers@mozilla.org\install.rdf
{talos}\talos\tests\tp5n\cnet.com\i.i.com.com\cnwk.1d\i\tron
{talos}\talos\tests\tp5n\yelp.com\media1.ct.yelpcdn.com\photo
{talos}\talos\tests\tp5n\alibaba.com\i03.i.aliimg.com\images\eng\style\css_images
{firefox}\crash reports\lastcrash
c:\slave\test\build\venv\lib\site-packages\pip\_vendor
{firefox}\browser\features\aushelper@mozilla.org.xpi
c:\users\cltbld.t-w732-ix-112.001\appdata\local\temp
{firefox}\browser\extensions\{uuid}.xpi
{firefox}\browser\features\flyweb@mozilla.org.xpi
{profile}\extensions\talos-powers@mozilla.org\install.rdf
\fi_unknown


most if not all of these are not useful, the two that we should verify are expected are:
{firefox}\browser\features\flyweb@mozilla.org.xpi
{firefox}\browser\features\aushelper@mozilla.org.xpi

I believe we want those in there- I would like to find a way to not report failures on these others- it looks like we are writing bytes for many of these (I assume cache), and we should reduce our failures.

Look at the list of bugs that this blocks for more info
:igoldan, is this something you could look into- this would resolve 10 intermittent bugs and probably prevent another 10 in the coming months.
Flags: needinfo?(igoldan)
Yes, I want to look over this bug. Thanks for pointing this issue out!
Flags: needinfo?(igoldan)
Assignee: nobody → igoldan
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #0)

> most if not all of these are not useful, the two that we should verify are
> expected are:
> {firefox}\browser\features\flyweb@mozilla.org.xpi
> {firefox}\browser\features\aushelper@mozilla.org.xpi
> 
> I believe we want those in there- I would like to find a way to not report
> failures on these others- it looks like we are writing bytes for many of
> these (I assume cache), and we should reduce our failures.
> 
> Look at the list of bugs that this blocks for more info

What do you mean by cache? In what way is this causing writing of bytes?
I honestly have no idea, I just saw in some of the bugs it mentioned these were write bytes, not read bytes- maybe our logic for whitelist isn't applying to write bytes
Well, we don't care about these read/write bytes. We only match against file patterns from xperf_whitelist.json. If one of our accessed file's pattern is not in there, we cause an intermittent.
if we can find a generic pattern to ignore the tp5 files, that would help a lot, same with some of the others.
Comment on attachment 8935376 [details]
Bug 1423353 - Repush fix for xperf intermittents

https://reviewboard.mozilla.org/r/206274/#review211874

just one nit to consider, overall this looks good and I would like to see this landed sooner rather than later :)

::: testing/talos/talos/xtalos/etlparser.py:39
(Diff revision 1)
> +    # with their real representations. So, prepend them with extra backslash.
> +    # Read more: https://docs.python.org/2.7/library/re.html#re.sub
> +    (re.compile(r'{\w{8}-\w{4}-\w{4}-\w{4}-\w{12}}'), '{uuid}'),
> +    (re.compile(r'talos\\tests\\tp5n\\.*'), r'talos\\tests\{tp5n_files}'),
> +    (re.compile(r'nvidia corporation\\3d vision\\.*'), '{nvidia_3d_vision}'),
> +    (re.compile(r'cltbld\.t-w732-ix-\d+\.\d+'), '{cltbld}'),

I would like to be careful here, cltbld could have other real files we care about, I think we just want to blanket substitute the cache related files.

::: testing/talos/talos/xtalos/etlparser.py:477
(Diff revision 1)
>  #                              " %s"
>  #                              % (filename, (files[row]['DiskReadCount'] +
>  #                                            files[row]['DiskWriteCount']),
>  #                                 wl[filename]['maxcount']))
>          else:
> -            errors.append("File '%s' was accessed and we were not expecting"
> +            errors.append("File '%s' (normalized from '%s') was accessed and we were not expecting"

this is really great to see!  good work on this part
Attachment #8935376 - Flags: review?(jmaher) → review+
Comment on attachment 8935376 [details]
Bug 1423353 - Repush fix for xperf intermittents

https://reviewboard.mozilla.org/r/206274/#review211874

> I would like to be careful here, cltbld could have other real files we care about, I think we just want to blanket substitute the cache related files.

This will still check the full path for these kinds of files. It only normalizes the variable part, which is the machine instance name. We will still get intermittents for unexpected files accesses under this directory.
This will be backed out, as xperf reports zero values from a badly handled ValueError.
Backed out as requested by igoldan.

https://hg.mozilla.org/mozilla-central/rev/a08e1277507b4a72049c758b09a23f51ddc51a14
Flags: needinfo?(igoldan)
Flags: needinfo?(igoldan)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla59 → ---
:igoldan, what is the status of this (is there a replacement patch in the works since the original was backed out)? Thanks
Flags: needinfo?(igoldan)
Whiteboard: [PI:December] → [PI:January]
Yes, I'm working on adding the fixes. I have to update the way I do imports from the xtalos sub-package. By tomorrow this should be done.
Flags: needinfo?(igoldan)
Comment on attachment 8935376 [details]
Bug 1423353 - Repush fix for xperf intermittents

https://reviewboard.mozilla.org/r/206274/#review217476

r+ again
Pushed by igoldan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f5b7b31c4da1
Repush fix for xperf intermittents r=jmaher
https://hg.mozilla.org/mozilla-central/rev/f5b7b31c4da1
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
See Also: → 1429687
You need to log in before you can comment on or make changes to this bug.