Closed Bug 1261199 Opened 9 years ago Closed 8 years ago

Use StructuredOutputParser for parsing reftest test results

Tracking

(firefox51 fixed)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox51

---

fixed

People

(Reporter: ahal, Assigned: pyang)

References

Details

Attachments

(1 file, 4 obsolete files)

reftest-structuredlog 9 years ago Paul Yang [:pyang] (away) 3.95 KB, patch		Details \| Diff \| Splinter Review
reftest-structuredlog 9 years ago Paul Yang [:pyang] (away) 6.87 KB, patch		Details \| Diff \| Splinter Review
enable-reftest 8 years ago Paul Yang [:pyang] (away) 2.66 KB, patch		Details \| Diff \| Splinter Review
enable-reftest 8 years ago Paul Yang [:pyang] (away) 3.80 KB, patch	ahal : review+	Details \| Diff \| Splinter Review
enable-reftest 8 years ago Paul Yang [:pyang] (away) 3.84 KB, patch	pyang : review+	Details \| Diff \| Splinter Review

Andrew Halberstadt [:ahal]

Reporter

Description

•

9 years ago

Reftest emits structured logs, but mozharness currently doesn't consume them. It is still parsing the old formatted logs using a set of regexes. We should make sure to use the StructuredOutputParser class instead of DesktopUnittestParser: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#502 We'll need to be careful that we don't miss failures that are hidden in unstructured output. This might mean running the same set of regexes when we come across unstructured output.

Paul Yang [:pyang] (away)

Assignee

Comment 1

•

9 years ago

Attached patch reftest-structuredlog (obsolete) — Details — Splinter Review

Adding reftest for using structured log

Attachment #8755752 - Flags: feedback?(ahalberstadt)

Paul Yang [:pyang] (away)

Assignee

Comment 2

•

9 years ago

Attached patch reftest-structuredlog (obsolete) — Details — Splinter Review

Attachment #8755752 - Attachment is obsolete: true

Attachment #8755752 - Flags: feedback?(ahalberstadt)

Paul Yang [:pyang] (away)

Assignee

Comment 3

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c8da83bce42e6ef730865dc8e08921136a25322&selectedJob=21354910 try result looks pretty ok.

Assignee: nobody → pyang

Paul Yang [:pyang] (away)

Assignee

Comment 4

•

9 years ago

Andrew, we don't see issues for enabling strcuturedlog for reftest. any comments?

Flags: needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Reporter

Comment 5

•

9 years ago

Awesome! So at the very minimum, I'd like to see: 1) Force a test-failure, make sure job turns orange. You can change an == to a != in the reftest manifest to do this. 2) Force a leak, make sure job turns orange. You can remove an asserts directive in the manifest to do this. 3) Force a crash, make sure job turns red. You can add a proc.kill(sig=singal.SIGABRT) somewhere in the harness to do this (after sleeping a bit to give the harnes a chance to start running tests). For all of the above we also need to make sure the appropriate failure messages appear in the log. I'd recommend doing a push both with and without your patch here, and compare the logs of the two. We really need better tests for all of this, but unfortunately we do not.

Flags: needinfo?(ahalberstadt)

Paul Yang [:pyang] (away)

Assignee

Comment 6

•

9 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #5) > For all of the above we also need to make sure the appropriate failure > messages appear in the log. I'd recommend doing a push both with and without > your patch here, and compare the logs of the two. We really need better > tests for all of this, but unfortunately we do not. does it mean to push 3 x 2 (with & without) pushes? Or we need combinations of those 3 conditions.

Andrew Halberstadt [:ahal]

Reporter

Comment 7

•

9 years ago

No, you don't need to do combinations.. you could even do all 3 in the same push if you wanted (i.e in different chunks)

Paul Yang [:pyang] (away)

Assignee

Comment 8

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f2ba4edb2a272b1aa0804fa344a2171648678fc3&selectedJob=21921047 It works for forcing orange. Working on force crash case.

Paul Yang [:pyang] (away)

Assignee

Comment 9

•

9 years ago

Hi ahal, here are force error changeset and result Force crash changeset https://hg.mozilla.org/try/rev/ff2dc3f806a6ffd1b699e86ac1c6366b07c39a57 treeherder result https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff2dc3f806a6ffd1b699e86ac1c6366b07c39a57 for the red suites, main process was killed so no 'SUITE-END' log.

Flags: needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Reporter

Comment 10

•

9 years ago

Haha nice! I like your time bomb. I see what you were saying earlier.. it would be good to get a little module we could use to test certain conditions like that somewhere so we can re-use it in other places. So it looks like there are a couple things to investigate here: 1) It looks like anytime something goes wrong, no suite_end message is being emitted. We should always be emitting this even if there are failures or crashes 2) When there is a crash, we don't generate a stack trace. Usually there should be a long spew of addresses and a PROCESS-CRASH log line. I'm not sure why this is not happening :/ 3) The output format is slightly different from before. I think this is because we aren't using the customized reftest formatter here: https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/output.py#13 We'll need to make sure that is the formatter being used to log to stdout. Let me know if you have anymore questions about these problems, or need a hand looking into them.

Flags: needinfo?(ahalberstadt)

Paul Yang [:pyang] (away)

Assignee

Comment 11

•

9 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #10) > Haha nice! I like your time bomb. I see what you were saying earlier.. it > would be good to get a little module we could use to test certain conditions > like that somewhere so we can re-use it in other places. So it looks like > there are a couple things to investigate here: > > 1) It looks like anytime something goes wrong, no suite_end message is being > emitted. We should always be emitting this even if there are failures or > crashes supposedly suite_end should be emitted by runner process. When it was forced killed, does it make sense to send suite_end by manage process? or we need to add signal handler for those situations. > > 2) When there is a crash, we don't generate a stack trace. Usually there > should be a long spew of addresses and a PROCESS-CRASH log line. I'm not > sure why this is not happening :/ > ya agree. Should add more log/stack trace while process lost. > 3) The output format is slightly different from before. I think this is > because we aren't using the customized reftest formatter here: > https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/output. > py#13 > > We'll need to make sure that is the formatter being used to log to stdout. Not really familiar with its format. Will check if we need to add another patch for that. > Let me know if you have anymore questions about these problems, or need a > hand looking into them. Thanks! Those are really helpful :D

Paul Yang [:pyang] (away)

Assignee

Updated

•

9 years ago

Depends on: 1286513

Paul Yang [:pyang] (away)

Assignee

Comment 12

•

8 years ago

Attached patch enable-reftest (obsolete) — Details — Splinter Review

leave only config patch here.

Attachment #8755827 - Attachment is obsolete: true

Paul Yang [:pyang] (away)

Assignee

Comment 13

•

8 years ago

Attached patch enable-reftest (obsolete) — Details — Splinter Review

Adding structured log enabling part

Attachment #8772705 - Attachment is obsolete: true

Paul Yang [:pyang] (away)

Assignee

Comment 14

•

8 years ago

try result by latest patch https://treeherder.mozilla.org/#/jobs?repo=try&revision=f15406fc51f9fd5f1a536c15bfb15be8eb3b48cd

Paul Yang [:pyang] (away)

Assignee

Comment 15

•

8 years ago

Comment on attachment 8779692 [details] [diff] [review] enable-reftest I re-triggered failed tests and looks green. Try wasn't stable in these couple of days, maybe we can run them again.

Attachment #8779692 - Flags: review?(ahalberstadt)

Andrew Halberstadt [:ahal]

Reporter

Comment 16

•

8 years ago

Comment on attachment 8779692 [details] [diff] [review] enable-reftest Review of attachment 8779692 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this looks good! I still want to do a bit of testing before we land this though, I'll do a new try run. For future reference, you can use '-u reftest,crashtest,jsreftest' for your try push, as we only care about those suites with this patch.

Attachment #8779692 - Flags: review?(ahalberstadt) → review+

Andrew Halberstadt [:ahal]

Reporter

Comment 17

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=31f11b48774b I'll do some local tests tomorrow too.

Andrew Halberstadt [:ahal]

Reporter

Comment 18

•

8 years ago

Nice that try run includes some known intermittents and a crash, both seem to get logged correctly! Paul could you upload a new patch with a proper commit message?

Flags: needinfo?(pyang)

Andrew Halberstadt [:ahal]

Reporter

Comment 19

•

8 years ago

I did find one minor "problem" that we need some sheriff input on. When tests fail to start, or there is a crash before the tests finish, DesktopUnittestOutputParser prints "No tests run or summary not found": https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/unittest.py#191 However when this happens with the StructuredOutputParser, it prints either "No checks run" and/or "No suite_end message was emitted by this harness": https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/structuredlog.py#15 The StructuredOutputParser behaviour is strictly better than the old message, as it can differentiate between "tests didn't start" and "tests didn't finish". There are two problems with changing this message though: 1) Treeherder regexes need to be updated to flag these two new messages 2) Existing intermittent bugs filed with the old message will no longer be starrable Problem 1) should be fixed either way, it's just a matter of do we need to block on it. Problem 2) will cause some pain for the sheriffs. A potential compromise would be to temporarily emit all three messages from the StructuredOutputParser, which will give sheriffs some time to migrate over to the new message from the old. Philor, Wes, what would you prefer we do here?

Flags: needinfo?(wkocher)

Flags: needinfo?(philringnalda)

Andrew Halberstadt [:ahal]

Reporter

Updated

•

8 years ago

Depends on: 1294703

Phil Ringnalda (:philor)

Comment 20

•

8 years ago

If you look at the existing bugs on the current message, you'll see that nobody is working on any of them, and that there's no understanding whatsoever of what the message means or why it is output or when it should or should not be starred or which bug it should be starred as, so it won't hurt anything if these classes of failure have no starrable message for a while.

Flags: needinfo?(philringnalda)

Andrew Halberstadt [:ahal]

Reporter

Comment 21

•

8 years ago

Thanks good to know. Also I'm an idiot, treeherder already flags the two new message because they are logged with ERROR. I didn't pay close enough attention to the failure summary. This means problem 1) is already solved. And given that starring the old message is already terrible, I guess there's nothing extra to be done here.

Flags: needinfo?(wkocher)

Paul Yang [:pyang] (away)

Assignee

Comment 22

•

8 years ago

Attached patch enable-reftest — Details — Splinter Review

modifying only commit message. carry on review. Thanks!

Attachment #8779692 - Attachment is obsolete: true

Flags: needinfo?(pyang)

Attachment #8780967 - Flags: review+

Pulsebot

Comment 23

•

8 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/95aef368faed Use StructuredOutputParser for parsing reftest test results; r=ahal

Andrew Halberstadt [:ahal]

Reporter

Comment 24

•

8 years ago

Thought I should mention that I tested this by writing a patch that runs both the DesktopUnittestOutputParser and the StructuredOutputParser on the same log. It then compares the TBPL status of both, and passes the job if they're the same, and fails the job if they're different: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3abf6cb08b972ba70af6f0941f0d6e2d2bfda7d All green means that both output parsers appear to be flagging the same sets of errors. It's not 100% fool proof, but much better than manually comparing logs.

Phil Ringnalda (:philor)

Comment 25

•

8 years ago

It should have found a difference on 330 seconds without output hangs, since we have been failing to actually parse anything for them other than the utterly useless "crash" that we cause after them, since somewhere around February, a thing which this now fixed.

Andrew Halberstadt [:ahal]

Reporter

Comment 27

•

8 years ago

Yeah, I wasn't able to pipe logs that were printed directly to stdout (i.e not through the logger) to the StructuredOutputParser in my test patch. I presume the 330s timeout message was one of those. But when the StructuredOutputParser encounters one of these in production, it passes it on to the base class. Which means unstructured output is now going through the base OutputParser instead of DesktopUnittestOutputParser. This is probably what inadvertently fixed it.

Ryan VanderMeulen [:RyanVM]

Comment 28

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/95aef368faed

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox51: --- → fixed

Resolution: --- → FIXED

Andrew Halberstadt [:ahal]

Reporter

Updated

•

8 years ago

Depends on: 1313265

James Graham [:jgraham]

Updated

•

8 years ago

Depends on: 1321127

Andrew Halberstadt [:ahal]

Reporter

Updated

•

8 years ago

Depends on: 1325148

You need to log in before you can comment on or make changes to this bug.