Use StructuredOutputParser for parsing reftest test results

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: ahal, Assigned: pyang)

Tracking

(Blocks: 1 bug)

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox51 fixed)

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

3 years ago
Reftest emits structured logs, but mozharness currently doesn't consume them. It is still parsing the old formatted logs using a set of regexes. We should make sure to use the StructuredOutputParser class instead of DesktopUnittestParser:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#502

We'll need to be careful that we don't miss failures that are hidden in unstructured output. This might mean running the same set of regexes when we come across unstructured output.
(Assignee)

Comment 1

3 years ago
Created attachment 8755752 [details] [diff] [review]
reftest-structuredlog

Adding reftest for using structured log
Attachment #8755752 - Flags: feedback?(ahalberstadt)
(Assignee)

Comment 2

3 years ago
Created attachment 8755827 [details] [diff] [review]
reftest-structuredlog
Attachment #8755752 - Attachment is obsolete: true
Attachment #8755752 - Flags: feedback?(ahalberstadt)
(Assignee)

Comment 4

3 years ago
Andrew, we don't see issues for enabling strcuturedlog for reftest.  any comments?
Flags: needinfo?(ahalberstadt)
(Reporter)

Comment 5

3 years ago
Awesome! So at the very minimum, I'd like to see:

1) Force a test-failure, make sure job turns orange. You can change an == to a != in the reftest manifest to do this.
2) Force a leak, make sure job turns orange. You can remove an asserts directive in the manifest to do this.
3) Force a crash, make sure job turns red. You can add a proc.kill(sig=singal.SIGABRT) somewhere in the harness to do this (after sleeping a bit to give the harnes a chance to start running tests).

For all of the above we also need to make sure the appropriate failure messages appear in the log. I'd recommend doing a push both with and without your patch here, and compare the logs of the two. We really need better tests for all of this, but unfortunately we do not.
Flags: needinfo?(ahalberstadt)
(Assignee)

Comment 6

3 years ago
(In reply to Andrew Halberstadt [:ahal] from comment #5)
> For all of the above we also need to make sure the appropriate failure
> messages appear in the log. I'd recommend doing a push both with and without
> your patch here, and compare the logs of the two. We really need better
> tests for all of this, but unfortunately we do not.

does it mean to push 3 x 2 (with & without) pushes? Or we need combinations of those 3 conditions.
(Reporter)

Comment 7

3 years ago
No, you don't need to do combinations.. you could even do all 3 in the same push if you wanted (i.e in different chunks)
(Assignee)

Comment 9

3 years ago
Hi ahal, here are force error changeset and result

Force crash changeset
https://hg.mozilla.org/try/rev/ff2dc3f806a6ffd1b699e86ac1c6366b07c39a57

treeherder result
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff2dc3f806a6ffd1b699e86ac1c6366b07c39a57

for the red suites, main process was killed so no 'SUITE-END' log.
Flags: needinfo?(ahalberstadt)
(Reporter)

Comment 10

3 years ago
Haha nice! I like your time bomb. I see what you were saying earlier.. it would be good to get a little module we could use to test certain conditions like that somewhere so we can re-use it in other places. So it looks like there are a couple things to investigate here:

1) It looks like anytime something goes wrong, no suite_end message is being emitted. We should always be emitting this even if there are failures or crashes

2) When there is a crash, we don't generate a stack trace. Usually there should be a long spew of addresses and a PROCESS-CRASH log line. I'm not sure why this is not happening :/

3) The output format is slightly different from before. I think this is because we aren't using the customized reftest formatter here:
https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/output.py#13

We'll need to make sure that is the formatter being used to log to stdout. Let me know if you have anymore questions about these problems, or need a hand looking into them.
Flags: needinfo?(ahalberstadt)
(Assignee)

Comment 11

3 years ago
(In reply to Andrew Halberstadt [:ahal] from comment #10)
> Haha nice! I like your time bomb. I see what you were saying earlier.. it
> would be good to get a little module we could use to test certain conditions
> like that somewhere so we can re-use it in other places. So it looks like
> there are a couple things to investigate here:
> 
> 1) It looks like anytime something goes wrong, no suite_end message is being
> emitted. We should always be emitting this even if there are failures or
> crashes
supposedly suite_end should be emitted by runner process. When it was forced killed, does it make sense to send suite_end by manage process?  or we need to add signal handler for those situations.

> 
> 2) When there is a crash, we don't generate a stack trace. Usually there
> should be a long spew of addresses and a PROCESS-CRASH log line. I'm not
> sure why this is not happening :/
> 

ya agree.  Should add more log/stack trace while process lost.

> 3) The output format is slightly different from before. I think this is
> because we aren't using the customized reftest formatter here:
> https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/output.
> py#13
> 
> We'll need to make sure that is the formatter being used to log to stdout.
Not really familiar with its format.  Will check if we need to add another patch for that.


> Let me know if you have anymore questions about these problems, or need a
> hand looking into them.
Thanks!  Those are really helpful :D
(Assignee)

Updated

3 years ago
Depends on: 1286513
(Assignee)

Comment 12

2 years ago
Created attachment 8772705 [details] [diff] [review]
enable-reftest

leave only config patch here.
Attachment #8755827 - Attachment is obsolete: true
(Assignee)

Comment 13

2 years ago
Created attachment 8779692 [details] [diff] [review]
enable-reftest

Adding structured log enabling part
Attachment #8772705 - Attachment is obsolete: true
(Assignee)

Comment 15

2 years ago
Comment on attachment 8779692 [details] [diff] [review]
enable-reftest

I re-triggered failed tests and looks green.
Try wasn't stable in these couple of days, maybe we can run them again.
Attachment #8779692 - Flags: review?(ahalberstadt)
(Reporter)

Comment 16

2 years ago
Comment on attachment 8779692 [details] [diff] [review]
enable-reftest

Review of attachment 8779692 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this looks good! I still want to do a bit of testing before we land this though, I'll do a new try run. For future reference, you can use '-u reftest,crashtest,jsreftest' for your try push, as we only care about those suites with this patch.
Attachment #8779692 - Flags: review?(ahalberstadt) → review+
(Reporter)

Comment 18

2 years ago
Nice that try run includes some known intermittents and a crash, both seem to get logged correctly! Paul could you upload a new patch with a proper commit message?
Flags: needinfo?(pyang)
(Reporter)

Comment 19

2 years ago
I did find one minor "problem" that we need some sheriff input on.

When tests fail to start, or there is a crash before the tests finish, DesktopUnittestOutputParser prints "No tests run or summary not found":
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/unittest.py#191

However when this happens with the StructuredOutputParser, it prints either "No checks run" and/or "No suite_end message was emitted by this harness":
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/structuredlog.py#15

The StructuredOutputParser behaviour is strictly better than the old message, as it can differentiate between "tests didn't start" and "tests didn't finish". There are two problems with changing this message though:

1) Treeherder regexes need to be updated to flag these two new messages
2) Existing intermittent bugs filed with the old message will no longer be starrable

Problem 1) should be fixed either way, it's just a matter of do we need to block on it. Problem 2) will cause some pain for the sheriffs. A potential compromise would be to temporarily emit all three messages from the StructuredOutputParser, which will give sheriffs some time to migrate over to the new message from the old.

Philor, Wes, what would you prefer we do here?
Flags: needinfo?(wkocher)
Flags: needinfo?(philringnalda)
(Reporter)

Updated

2 years ago
Depends on: 1294703
If you look at the existing bugs on the current message, you'll see that nobody is working on any of them, and that there's no understanding whatsoever of what the message means or why it is output or when it should or should not be starred or which bug it should be starred as, so it won't hurt anything if these classes of failure have no starrable message for a while.
Flags: needinfo?(philringnalda)
(Reporter)

Comment 21

2 years ago
Thanks good to know.

Also I'm an idiot, treeherder already flags the two new message because they are logged with ERROR. I didn't pay close enough attention to the failure summary. This means problem 1) is already solved. And given that starring the old message is already terrible, I guess there's nothing extra to be done here.
Flags: needinfo?(wkocher)
(Assignee)

Comment 22

2 years ago
Created attachment 8780967 [details] [diff] [review]
enable-reftest

modifying only commit message. carry on review.
Thanks!
Attachment #8779692 - Attachment is obsolete: true
Flags: needinfo?(pyang)
Attachment #8780967 - Flags: review+

Comment 23

2 years ago
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/95aef368faed
Use StructuredOutputParser for parsing reftest test results; r=ahal
(Reporter)

Comment 24

2 years ago
Thought I should mention that I tested this by writing a patch that runs both the DesktopUnittestOutputParser and the StructuredOutputParser on the same log. It then compares the TBPL status of both, and passes the job if they're the same, and fails the job if they're different:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3abf6cb08b972ba70af6f0941f0d6e2d2bfda7d

All green means that both output parsers appear to be flagging the same sets of errors. It's not 100% fool proof, but much better than manually comparing logs.
It should have found a difference on 330 seconds without output hangs, since we have been failing to actually parse anything for them other than the utterly useless "crash" that we cause after them, since somewhere around February, a thing which this now fixed.
Duplicate of this bug: 1296162
(Reporter)

Comment 27

2 years ago
Yeah, I wasn't able to pipe logs that were printed directly to stdout (i.e not through the logger) to the StructuredOutputParser in my test patch. I presume the 330s timeout message was one of those.

But when the StructuredOutputParser encounters one of these in production, it passes it on to the base class. Which means unstructured output is now going through the base OutputParser instead of DesktopUnittestOutputParser. This is probably what inadvertently fixed it.

Comment 28

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/95aef368faed
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox51: --- → fixed
Resolution: --- → FIXED
(Reporter)

Updated

2 years ago
Depends on: 1313265
Depends on: 1321127
(Reporter)

Updated

2 years ago
Depends on: 1325148
You need to log in before you can comment on or make changes to this bug.