1614805 - Windows 10 xperf tasks are sometimes reporting all zeros for results (but still passes)

Reporter

Description

•

5 years ago

I noticed in running xperf tests on the Talos servers, that sometimes a run will report every result as zero, but it still passes

This is throwing off the average/standard deviation calculations when doing the comparison, and this can lead to some quite big deltas (e.g. 20%).

It seems only to happen on Windows 10 Quantum Renderer builds, with shippable and non-shippable.

On some try runs I had one or two "all-zeros" per 5 runs.

From the log it looks like we know there's something wrong, but it doesn't get reported as an actual error.

Really there's probably two issues here:

The fact the error isn't reported as a failure.
The error that caused this in the first place.

Example log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=288241114&repo=try&lineNumber=1820

[task 2020-02-10T18:29:39.624Z] 18:29:39     INFO -  Merged Etl: test.etl
[task 2020-02-10T18:31:21.234Z] 18:31:21     INFO -  reading etl filename: test.etl
[task 2020-02-10T18:31:21.234Z] 18:31:21     INFO -  etlparser: in readfile: test.etl.csv
[task 2020-02-10T18:31:21.234Z] 18:31:21    ERROR -  Traceback (most recent call last):
[task 2020-02-10T18:31:21.234Z] 18:31:21     INFO -    File "Z:\task_1581358975\build\tests\talos\talos/xtalos/parse_xperf.py", line 115, in <module>
[task 2020-02-10T18:31:21.234Z] 18:31:21     INFO -      main()
[task 2020-02-10T18:31:21.234Z] 18:31:21     INFO -    File "Z:\task_1581358975\build\tests\talos\talos/xtalos/parse_xperf.py", line 109, in main
[task 2020-02-10T18:31:21.234Z] 18:31:21     INFO -      **args.__dict__)
[task 2020-02-10T18:31:21.235Z] 18:31:21     INFO -    File "Z:\task_1581358975\build\tests\talos\talos/xtalos/parse_xperf.py", line 90, in stop_from_config
[task 2020-02-10T18:31:21.235Z] 18:31:21     INFO -      processID=kwargs['processID'])
[task 2020-02-10T18:31:21.235Z] 18:31:21     INFO -    File "Z:\task_1581358975\build\tests\talos\talos\xtalos\etlparser.py", line 573, in etlparser_from_config
[task 2020-02-10T18:31:21.235Z] 18:31:21     INFO -      etlparser(**args)
[task 2020-02-10T18:31:21.235Z] 18:31:21     INFO -    File "Z:\task_1581358975\build\tests\talos\talos\xtalos\etlparser.py", line 355, in etlparser
[task 2020-02-10T18:31:21.235Z] 18:31:21     INFO -      trackThreadNetIO(row, io, stage)

Mark Banner (:standard8)

Reporter

Updated

•

5 years ago

Summary: Windows 10 Quantum xperf tasks are sometimes reporting all zeros for results (but still passes) → Windows 10 Quantum Renderer xperf tasks are sometimes reporting all zeros for results (but still passes)

Greg Mierzwinski [:sparky]

Updated

•

5 years ago

Priority: -- → P2

Whiteboard: [perftest:triage]

Mark Banner (:standard8)

Reporter

Comment 1

•

5 years ago

This isn't just Quantum Renderer builds, this also happens on non-QR builds as well, e.g. https://treeherder.mozilla.org/#/jobs?repo=try&revision=648731c016dec86f16285e4207b22b57c4a2903d&selectedJob=289549501

Summary: Windows 10 Quantum Renderer xperf tasks are sometimes reporting all zeros for results (but still passes) → Windows 10 xperf tasks are sometimes reporting all zeros for results (but still passes)

Greg Mierzwinski [:sparky]

Comment 2

•

5 years ago

:bebe, can someone from your team see if this error has an easy solution? If it doesn't, can you provide details for how/why it isn't simple to fix?

Flags: needinfo?(fstrugariu)

Greg Mierzwinski [:sparky]

Updated

•

5 years ago

Whiteboard: [perftest:triage]

Florin Strugariu [:Bebe]

Updated

•

5 years ago

Assignee: nobody → aionescu

Status: NEW → ASSIGNED

Flags: needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Updated

•

5 years ago

Priority: P2 → P1

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

5 years ago

Regressions: 1621233

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 3

•

5 years ago

•

Edited

The trackThreadNetIO(row, io, stage) error doesn't seem to be consistent.
The pattern seems to look like this:
Windows 7 Shippable opt: zeroes on

tp5n main_normal_netio opt e10s stylo INTERMITTENT,
tp5n main_startup_netio opt e10s stylo

Windows 10 x64 Shippable opt: zeroes on

tp5n main_normal_netio opt e10s stylo INTERMITTENT,
tp5n main_startup_netio opt e10s stylo,
tp5n mainthread_writebytes opt e10s stylo,
tp5n mainthread_writecount opt e10s stylo

Windows 10 x64 QuantumRender Shippable opt: zeroes on

tp5n main_normal_netio opt e10s stylo INTERMITTENT,
tp5n main_startup_netio opt e10s stylo,
tp5n mainthread_writebytes opt e10s stylo,
tp5n mainthread_writecount opt e10s stylo

Ongoing investigation. Have to look further on all subtests to determine any more intermittens.

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 4

•

5 years ago

•

Edited

Mark, I didn't find any more xperf jobs passing with the error. Also I can access the log in your description, looks like it's been removed from storage. Could you point me to a log that is actually available?
Also, seems like those talos tests were designed to report zeroes. The graph contains zeroes since 1 year ago (on autoland) for almost all xperf tests on all winfows platforms.

Aaron, I know you're the owner of those tests. Any suggestion?

Flags: needinfo?(aklotz)

Mark Banner (:standard8)

Reporter

Comment 5

•

5 years ago

The netio variants seem to always produce zero (or nearly always), so you can ignore those.

You need to look at the main_normal_fileio and main_startup_fileio values:

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&selected=1922959,1074880601&series=autoland,1922967,1,1&series=autoland,1922959,1,1&timerange=2592000

These are the direct links to the latest two failures:

https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=292924730&group_state=expanded&revision=cb1605f13d5f56252b6dc89e2934f4c4aedac298
https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=292721403&group_state=expanded&revision=d28ceb16f6839c79662e1eb355584d87a9810d60

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

5 years ago

Priority: P1 → P2

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 6

•

5 years ago

Waiting for :aklotz reply in order to proceed.

(No longer employed by Mozilla) Aaron Klotz

Comment 7

•

5 years ago

Attached file Microsoft-Windows-TCPIP event names — Details

Two issues here: * Obviously silently warning about those failures is bad and that should be fixed. * Re: the regexp failure. Here is a dump of the fields generated by the `Microsoft-Windows-TCPIP` provider: ``` ```

(No longer employed by Mozilla) Aaron Klotz

Comment 8

•

5 years ago

Two issues here:

Obviously silently warning about those failures is bad and that should be fixed.
Re: the regexp failure. I have attached the event names that came from my machine running Windows 10 Fall 2019 Update. Note that sometimes xperf event names and behaviours change between Windows versions. Is there any sort of correlation between the xperf failures and the version of the OS on which the test is running?

Flags: needinfo?(aklotz)

:Gijs (out for now; he/him)

Comment 9

•

5 years ago

Ni for comment #8

Flags: needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 10

•

5 years ago

•

Edited

I wanted to know if these silent failures are intended or not. I am going to investigate further to find out how can we fix this and then to answer the question about correlation between the xperf failures and the version of OS.

Flags: needinfo?(aionescu)

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Priority: P2 → P1

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 11

•

5 years ago

(In reply to Aaron Klotz [:aklotz] from comment #8)

Two issues here:

Obviously silently warning about those failures is bad and that should be fixed.

Aaron, should I understand that the zero-values reported are fine and just the failures should be fixed?

Flags: needinfo?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Comment 12

•

5 years ago

The zero values are not fine, but if you fix the failures they should go away.

Flags: needinfo?(aklotz)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 13

•

5 years ago

(In reply to Aaron Klotz [:aklotz] from comment #8)

Re: the regexp failure. I have attached the event names that came from my machine running Windows 10 Fall 2019 Update. Note that sometimes xperf event names and behaviours change between Windows versions. Is there any sort of correlation between the xperf failures and the version of the OS on which the test is running?

I don't know. What can I do is to add a print to the event that's not matching the pattern and follow the logs when the intermittent appears again.

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 14

•

5 years ago

•

Edited

Aaron, were does etl_output.csv come from? The filename is hard-coded and seems to be no track of it anywhere else. It contains the zeroes for writecount and writebytes. Could you please give me more insights about how is this file created?

Flags: needinfo?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Comment 15

•

5 years ago

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/testing/talos/talos/xtalos/etlparser.py#547

Flags: needinfo?(aklotz)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 16

•

5 years ago

•

Edited

Aaron, looks like the conditions in the lambda function cover only the events with no DiskWriteCount and DiskWriteBytes:

[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\favicons.sqlite', '6732 (main)', 'startup') {'DiskReadBytes': 196808, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 8}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('Z:\\task_1586329312\\build\\application\\firefox\\defaults\\pref\\channel-prefs.js', '6732 (main)', 'startup') {'DiskReadBytes': 866, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\permissions.sqlite', '6732 (main)', 'startup') {'DiskReadBytes': 4392, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 12}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('Z:\\task_1586329312\\AppData\\Roaming\\Mozilla\\firefox\\Crash Reports\\InstallTime20200408054848', '6732 (main)', 'startup') {'DiskReadBytes': 20, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\key4.db', '6732 (main)', 'startup') {'DiskReadBytes': 196936, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 16}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\favicons.sqlite-shm', '6732 (main)', 'startup') {'DiskReadBytes': 65536, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('Z:\\task_1586329312\\build\\application\\firefox\\dependentlibs.list', '6732 (main)', 'startup') {'DiskReadBytes': 16384, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 4}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\prefs.js', '6732 (main)', 'startup') {'DiskReadBytes': 28970, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:15.003Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\cookies.sqlite', '6732 (main)', 'startup') {'DiskReadBytes': 131272, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 6}
[task 2020-04-08T07:30:15.004Z] 07:30:15     INFO -  ('C:\\Windows\\System32\\InputHost.dll', '6732 (main)', 'startup') {'DiskReadBytes': 32768, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:15.004Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\user.js', '6732 (main)', 'startup') {'DiskReadBytes': 18288, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:15.004Z] 07:30:15     INFO -  ('C:\\Windows\\System32\\spool\\drivers\\color\\sRGB Color Space Profile.icm', '6732 (main)', 'startup') {'DiskReadBytes': 8192, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:15.004Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\cert9.db', '6732 (main)', 'startup') {'DiskReadBytes': 131336, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 10}
[task 2020-04-08T07:30:15.004Z] 07:30:15     INFO -  ('C:\\Windows\\System32\\setupapi.dll', '6732 (main)', 'startup') {'DiskReadBytes': 88064, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 8}
[task 2020-04-08T07:30:15.004Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\xulstore\\data.mdb', '6732 (main)', 'startup') {'DiskReadBytes': 1216, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 8}
[task 2020-04-08T07:30:15.004Z] 07:30:15     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\places.sqlite-shm', '6732 (main)', 'startup') {'DiskReadBytes': 65536, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:16.992Z] 07:30:16     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\places.sqlite', '6732 (main)', 'startup') {'DiskReadBytes': 262344, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 10}
[task 2020-04-08T07:30:16.992Z] 07:30:16     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\pkcs11.txt', '6732 (main)', 'startup') {'DiskReadBytes': 16384, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 4}
[task 2020-04-08T07:30:16.993Z] 07:30:16     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\cookies.sqlite-shm', '6732 (main)', 'startup') {'DiskReadBytes': 65536, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}
[task 2020-04-08T07:30:16.993Z] 07:30:16     INFO -  ('C:\\Users\\task_1586329312\\AppData\\Local\\Temp\\tmpmpvoz_\\profile\\compatibility.ini', '6732 (main)', 'startup') {'DiskReadBytes': 446, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 2}

We probably need to revisit that logic in order to get rid of the zero-values.

There's this lambda function that filters those values and looks like this:

filekeys = filter(lambda x: (all_stages or x[2] == stages[0]) and
                            (all_threads or x[1].endswith("(main)")) and
                            (all_stages and x[2] != stages[0] or
                             not checkWhitelist(x[0], whitelist)),
                  files.iterkeys())

If we take favicons.sqlite and look at the unfiltered values from etl_output.csv it looks like this:

('C:\\Users\\task_1586182132\\AppData\\Local\\Temp\\tmpajjk25\\profile\\favicons.sqlite', '9912', 'all'): 
    {'DiskReadBytes': 458952, 'DiskWriteCount': 94, 'DiskWriteBytes': 4915200, 'DiskReadCount': 16},
('C:\\Users\\task_1586182132\\AppData\\Local\\Temp\\tmpajjk25\\profile\\favicons.sqlite', '11180 (main)', 'all'): 
    {'DiskReadBytes': 196808, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 8},
('C:\\Users\\task_1586182132\\AppData\\Local\\Temp\\tmpajjk25\\profile\\favicons.sqlite', '9912', 'normal'): 
    {'DiskReadBytes': 458952, 'DiskWriteCount': 94, 'DiskWriteBytes': 4915200, 'DiskReadCount': 16},
('C:\\Users\\task_1586182132\\AppData\\Local\\Temp\\tmpajjk25\\profile\\favicons.sqlite', '11180 (main)', 'startup'): 
    {'DiskReadBytes': 196808, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 8},
('C:\\Users\\task_1586182132\\AppData\\Local\\Temp\\tmpajjk25\\profile\\favicons.sqlite', '9912', 'startup'): 
    {'DiskReadBytes': 262344, 'DiskWriteCount': 0, 'DiskWriteBytes': 0, 'DiskReadCount': 10},

DiskWriteCount and DiskWriteBytes are greater than 0 for all and normal stages. Looking at the code for debug mode:

if debug:
    # in debug, we want stages = [startup+normal] and all threads, not just (main)
    # we will use this data to upload fileIO info to blobber only for debug mode
    outputData = filter(lambda x: (all_stages or x[2] in [stages[0], stages[1]]) and
                                  (all_stages and x[2] not in [stages[0], stages[1]] or
                                   not checkWhitelist(x[0], whitelist)),
                        files.iterkeys())

we can see that it includes stages[1] in x[2], which is exactly the normal stage.
I don't have thorough enough knowledge about this code to tell if this is the right fix, but is the closest fix I found so far (inclusing stages[1] in the expression against x[2]).

Thoughts?

Flags: needinfo?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Comment 17

•

5 years ago

Since this code landed 7 years ago, so I am having trouble believing that the emergence of this issue is with the filtering. I am happy to be convinced otherwise.

I'm confused, though: based on the failures in comment 5, we're seeing AttributeErrors because of a regex that isn't matching; the script is bailing out, hence we aren't producing good data. My expectation was that once we fixed that, these zeros would go away. Is this data from comment 16 coming from code that includes the regex fix?

Also, keep in mind what this test is actually seeking: I/O reads and writes on the main thread are bad, especially when they occur during startup. We segment the timeline into the startup, normal, shutdown stages to assist us with determining how "bad" the main thread I/O is. Changes in non-main thread I/O will happen all the time, and we don't really care about that unless there is some kind of weird, unreasonable spike. Keeping this in mind, in an ideal world where we spectacularly pass the test with flying colours, we would want zeros for all stages on the main thread. In practice that is not 100% possible, hence we have the whitelist. And of course, 0 I/O across all threads and all stages is clearly an error, since a web browser that doesn't do any I/O is not much of a web browser at all.

Flags: needinfo?(aklotz)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 18

•

5 years ago

•

Edited

Aaron, the failures in comment 5 are indeed due to e regex but are very rare and what I did is to add a try catch to be able to see the string that is failing. However, the zero-values aren't caused by that as they are permanent (they happen also when the matching error doesn't). So this issue has 2 culprits: silent failure because of the re.match and the zero-values whose occurrence I explained in comment 15. With the mention that the zero-values happen since ever while the match failure is relatively recent.

As we can't really see what event name is causing the silent fail, my proposal is to land a patch adding a try/except to the re.match as the intermittent occurrence is fairly low.

I wouldn't wait one more week, as this patch is fairly easy to review, and it could buy us time to advance with the investigations about zero-values.
Thanks for your answer, Aaron.

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 19

•

5 years ago

Attached file Bug 1614805 Part 1: catch re.match error r?#perftest,bebe — Details

Phabricator Automation

Updated

•

5 years ago

Attachment #9140114 - Attachment description: Bug 1614805 Part 1: catch re.match error → Bug 1614805 Part 1: catch re.match error r?bebe,#perftest

Phabricator Automation

Updated

•

5 years ago

Attachment #9140114 - Attachment description: Bug 1614805 Part 1: catch re.match error r?bebe,#perftest → Bug 1614805 Part 1: catch re.match error r?#perftest,bebe

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 20

•

5 years ago

Note to self to remove the leave-open keyword after this is fixed.

Flags: needinfo?(aionescu)

Pulsebot

Comment 21

•

5 years ago

Pushed by aionescu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f2c1f62b1f65 Part 1: catch re.match error r=perftest-reviewers,Bebe

Oana Pop-Rus

Comment 22

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f2c1f62b1f65

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 23

•

5 years ago

(In reply to Mark Banner (:standard8) from comment #5)

The netio variants seem to always produce zero (or nearly always), so you can ignore those.

You need to look at the main_normal_fileio and main_startup_fileio values:

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&selected=1922959,1074880601&series=autoland,1922967,1,1&series=autoland,1922959,1,1&timerange=2592000

These are the direct links to the latest two failures:

https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=292924730&group_state=expanded&revision=cb1605f13d5f56252b6dc89e2934f4c4aedac298
https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=292721403&group_state=expanded&revision=d28ceb16f6839c79662e1eb355584d87a9810d60

Mark, from what you're saying, this bug was open because of the zero-values reported by the silent re.match error and we don't care about others. Is that right?

Flags: needinfo?(aionescu) → needinfo?(standard8)

Mark Banner (:standard8)

Reporter

Comment 24

•

5 years ago

(In reply to (PTO until April 20th) Alexandru Ionescu (needinfo me) :alexandrui from comment #23)

Mark, from what you're saying, this bug was open because of the zero-values reported by the silent re.match error and we don't care about others. Is that right?

The ones we care in this bug are the ones where we have all-zeros reported across all tests. I don't know if anything should be done about the sub-tests that report zeros all the time - probably nothing based on the previous discussions.

Flags: needinfo?(standard8)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 25

•

5 years ago

Meanwhile I landed a patch meant to log more details about the regex matching error and found this job reporting that the event name causing the error looks like this:
Error matching event Microsoft-Windows-TCPIP//win:Info: 'NoneType' object has no attribute 'group'
The event name with prefix and suffix is Microsoft-Windows-TCPIP//win:Info and the event name alone seems to be an empty string .

I am not sure what I should understand from here. Aaron, could you please help on this?

Flags: needinfo?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Comment 26

•

5 years ago

Microsoft-Windows-TCPIP//win:Info is different from the other event names that I posted in the attachment.

The regex in trackThreadNetIO tries to capture the content between the slashes, but in this case the content between the slashes is empty.

Flags: needinfo?(aklotz)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 27

•

5 years ago

•

Edited

(In reply to Aaron Klotz [:aklotz] from comment #26)

Microsoft-Windows-TCPIP//win:Info is different from the other event names that I posted in the attachment.

The regex in trackThreadNetIO tries to capture the content between the slashes, but in this case the content between the slashes is empty.

I mean, in the context of why is this failing the regular expression matching, is this supposed to happen? Is this normal? Should the code cover this empty match continue the execution or fail it as not normal?

(In reply to Aaron Klotz [:aklotz] from comment #17)

Also, keep in mind what this test is actually seeking: I/O reads and writes on the main thread are bad, especially when they occur during startup.

Considering this, the only concerning zeroes are indeed the ones provoked by the matching error during startup for windows10-64-shippable, windows10-64 and windows10-64-qr. So once we get to the same page with the questions above, we can identify a solution.

Flags: needinfo?(aklotz)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 28

•

5 years ago

Attached file Bug 1614805 Part 2: modify the re.match expression to cover the empty event r?#perftest,bebe — Details

(No longer employed by Mozilla) Aaron Klotz

Comment 29

•

5 years ago

(In reply to Alexandru Ionescu (needinfo me) :alexandrui from comment #27)

I mean, in the context of why is this failing the regular expression matching, is this supposed to happen? Is this normal? Should the code cover this empty match continue the execution or fail it as not normal?

We already filter events based on the text between the slashes, so an empty string would be treated identically to any other string that we don't care about. So yes, we should just produce an empty string and continue execution.

Flags: needinfo?(aklotz)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 30

•

5 years ago

Aaron, there's a piece of code here which is continuing the execution if the while-list file is not found, just like it was for the events not fitting in the regular expression. It is prone to false results and can misslead. I would raise an exception and fail the execution of the code instead of returning.

Flags: needinfo?(aklotz)

Pulsebot

Comment 31

•

5 years ago

Pushed by aionescu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d3c0d7256d20 Part 2: modify the re.match expression to cover the empty event r=perftest-reviewers,sparky

(No longer employed by Mozilla) Aaron Klotz

Comment 32

•

5 years ago

That's fine.

Flags: needinfo?(aklotz)

Andreea Pavel [:apavel]

Comment 33

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d3c0d7256d20

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

5 years ago

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Keywords: leave-open

Microsoft-Windows-TCPIP event names 5 years ago (No longer employed by Mozilla) Aaron Klotz 4.17 KB, text/plain		Details
Bug 1614805 Part 1: catch re.match error r?#perftest,bebe 5 years ago Alexandru Ionescu (needinfo me) [:alexandrui] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1614805 Part 2: modify the re.match expression to cover the empty event r?#perftest,bebe 5 years ago Alexandru Ionescu (needinfo me) [:alexandrui] 47 bytes, text/x-phabricator-request		Details \| Review