1343659 - Mozharness should use log level ERROR for rust panics lines

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Description

•

8 years ago

I would like lines like [1] to be recognized as errors in treeherder so that they show up in the failure summary and can be used to inform starring of bugs.

I wrote a patch but I don't think I can test it easily.

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=79852537&repo=graphics&lineNumber=9313

GitHub Autolander Bot

Comment 1

•

8 years ago

Attached file [treeherder] staktrace:rustpanic > mozilla:master (obsolete) — Details

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Updated

•

8 years ago

Attachment #8842588 - Flags: review?(wlachance)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Updated

•

8 years ago

Assignee: nobody → bugmail

Ed Morley [:emorley]

Comment 2

•

8 years ago

Thank you for the PR!

Before we try this approach, would you mind seeing if mozharness can be made to output these lines with a log level of ERROR not INFO? (Since Treeherder already recognises mozharness ERROR prefixes)

ie at the moment the line is incorrectly using INFO:

[task 2017-02-24T04:20:20.368227Z] 04:20:20     INFO - ...

We're trying to move away from having to hardcode every variation of error string in the log parser where possible.

William Lachance (:wlach)

Comment 3

•

8 years ago

Comment on attachment 8842588 [details] [review]
[treeherder] staktrace:rustpanic > mozilla:master

Ed's suggestion would be ideal if it's not too difficult (unfortunately I suspect it is).

Otherwise I think this should be fine to go in with a unit test (see PR for details).

Attachment #8842588 - Flags: review?(wlachance)

Ed Morley [:emorley]

Comment 4

•

8 years ago

I think it should be possible via:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/errors.py

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/116452/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/116452/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/116454/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/116454/

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 7

•

8 years ago

Thanks all! I'll try to do the change in mozharness along the lines of what Wes posted in his patches. Although his patch seems to modify the "logcat" error strings which sounds like it's Android-specific?

William Lachance (:wlach)

Comment 8

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> Thanks all! I'll try to do the change in mozharness along the lines of what
> Wes posted in his patches. Although his patch seems to modify the "logcat"
> error strings which sounds like it's Android-specific?

Yeah that doesn't seem right to me. Maybe we need some kind of new category for these types of error strings.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 9

•

8 years ago

Suppose I should have pushed that with http://imgur.com/a/cp4ky

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 10

•

8 years ago

Well, I tried the patch at [1], and the job at [2] has the rust panic line in the output but it didn't get flagged as an error. Is there a way to test this locally? Trial-and-error with try pushes is going to be quite painful.

[1] https://hg.mozilla.org/try/rev/3a255d30ece138d2bab3072dc9069a51a15f6db7
[2] https://treeherder.mozilla.org/logviewer.html#?job_id=81233456&repo=try&lineNumber=2289

William Lachance (:wlach)

Comment 11

•

8 years ago

Andrew, can you help Kats with this? See comment 10.

Flags: needinfo?(ahalberstadt)

Ed Morley [:emorley]

Updated

•

8 years ago

Component: Treeherder: Log Viewer → Mozharness

Product: Tree Management → Release Engineering

QA Contact: jlund

Summary: TreeHerder should recognize rust panics as errors → Mozharness should use log level ERROR for rust panics lines

Ed Morley [:emorley]

Updated

•

8 years ago

Attachment #8842588 - Attachment is obsolete: true

Andrew Halberstadt [:ahal]

Assignee

Comment 12

•

8 years ago

Kats, I think you have it right but it looks like your regex doesn't match because the thread name is single-quoted. Try using:
thread '(\w+)' panicked

Unfortunately there's not a great way of testing this locally. I usually write a script that manually imports and mocks out an OutputParser object, and pipe logs into it from my mach runs.. But it's not really more convenient than pushing to try.

Flags: needinfo?(ahalberstadt)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 13

•

8 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #12)
> Kats, I think you have it right but it looks like your regex doesn't match
> because the thread name is single-quoted. Try using:
> thread '(\w+)' panicked

That didn't work either. Try push at [1], log at [2]. I verified that the regex matches by running this in my shell:
  python -c "import re; print re.compile(r'''thread '([^']+)' panicked''').match(r'''thread 'RenderBackend' panicked at 'called''').group(0)"

[1] https://hg.mozilla.org/try/rev/f85149064dc88f99baf8e57d322bcc6b20df959f
[2] https://treeherder.mozilla.org/logviewer.html#?job_id=81315988&repo=try&lineNumber=2252

Andrew Halberstadt [:ahal]

Assignee

Comment 14

•

8 years ago

Ok, I see why adding a regex to the errorlist isn't working. That log message is being logged via the 'info' level from the structured logger. For example, check out the raw log:
https://public-artifacts.taskcluster.net/fHFkweGmTOOzETCWHMbJuw/0/public/test_info//browser-chrome-gpu_raw.log

Now ctrl-f for 'panicked' and you'll notice it's a 'log' action with 'level': 'info'. Mozharness/treeherder will automatically highlight anything logged with 'error' or above. The 'errorlist' regexes are only used to catch errors in unstructured output. Do you happen to know where that rust output is being logged from?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 15

•

8 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #14)
> Do you happen to know where that rust output
> is being logged from?

The rust output is spit out directly from the firefox binary to stderr, as far as I know. I'm not sure what all the different harness layers are doing with the output along the way.

Andrew Halberstadt [:ahal]

Assignee

Comment 16

•

8 years ago

I spy some mochitest stupidity.. I'm going to file a bug that blocks this, though it might take a bit of time to get right. So if you're in a rush to get this landed and wlach/emorley are ok with it, I'd recommend landing your original patch if possible.

Ed, would it be alright to re-open/land the original pull request for now?

Flags: needinfo?(emorley)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 17

•

8 years ago

I'm not in a particular rush to get this landed. I can wait a bit.

Flags: needinfo?(emorley)

Andrew Halberstadt [:ahal]

Assignee

Updated

•

8 years ago

Depends on: 1344346

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 18

•

8 years ago

Now that bug 1344346 is in m-c, I'll try the patch at [1] again. Let me know if there's more things that need to be fixed first.

[1] https://hg.mozilla.org/try/rev/f85149064dc88f99baf8e57d322bcc6b20df959f

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 19

•

8 years ago

Still not working: https://treeherder.mozilla.org/logviewer.html#?job_id=83460775&repo=try&lineNumber=17598

The same log later has TEST-UNEXPECTED and PROCESS-CRASH, both of which are also "INFO" level despite already being listed as ERROR-level substrings at https://hg.mozilla.org/try/rev/9470d7dab8d2cc5914763a5dbd5b5a5f18e45b0d#l1.10

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 20

•

8 years ago

Flags: needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Assignee

Comment 21

•

8 years ago

Crap, sorry.. thought that would fix it.

I'm PTO until the end of May, so re-directing to jgraham.

Flags: needinfo?(ahalberstadt) → needinfo?(james)

James Graham [:jgraham]

Comment 22

•

8 years ago

Right, this is being logged as process_output from the reftest harness. Which seems prety reasonable, but I think you now need to land your changes in https://hg.mozilla.org/try/rev/3a255d30ece138d2bab3072dc9069a51a15f6db7 so that mozharness flags this process_output as an error. let me know if that doesn't work.

Flags: needinfo?(james)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 23

•

8 years ago

If I understand you correctly, that's exactly what I did in the try push in comment 19, but it still didn't work. Or has more stuff landed in terms of changing the logging since then?

I've kicked off another try push anyway, just to confirm: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf051186a554bdc95d6354907f0ecd014300f5ac

James Graham [:jgraham]

Comment 24

•

8 years ago

Ah, sorry I forget that people put actual code in the try-cyntax commit (mach try is your friend!). I'll investigate again.

Flags: needinfo?(james)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 25

•

8 years ago

Yeah it's definitely still not working: https://treeherder.mozilla.org/logviewer.html#?job_id=86264921&repo=try&lineNumber=17388

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 26

•

8 years ago

Any update here, James?

James Graham [:jgraham]

Comment 27

•

8 years ago

So the StruturedOutputParser wasn't using this code at all for lines with action="log". The branch in https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b689b80021d22680b2f0da99cc36dfcc0d4f0f6 changes that, which I think will fix this case (the old logs were lost so I can't easily work out if this applies to the original problem).

Can you rerun something that is expected to fail on top of the above changes?

Flags: needinfo?(james)

James Graham [:jgraham]

Updated

•

8 years ago

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 28

•

8 years ago

I kicked off a try push with your patches from the try run above plus a webrender change that should trigger a panic.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3fc73392c4d94eab0103b82b76178e4ddb29f3a

Leaving needinfo on me until I confirm the panic-inducing change works as intended :)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 29

•

8 years ago

Oh.. it looks like there's an error in one your patches:

[task 2017-04-27T14:48:19.241204Z] Exception in thread Thread-1:
[task 2017-04-27T14:48:19.241282Z] Traceback (most recent call last):
[task 2017-04-27T14:48:19.241410Z]   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
[task 2017-04-27T14:48:19.241448Z]     self.run()
[task 2017-04-27T14:48:19.241490Z]   File "/usr/lib/python2.7/threading.py", line 754, in run
[task 2017-04-27T14:48:19.241621Z]     self.__target(*self.__args, **self.__kwargs)
[task 2017-04-27T14:48:19.241681Z]   File "/home/worker/workspace/mozharness/mozprocess/processhandler.py", line 733, in _processOutput
[task 2017-04-27T14:48:19.241720Z]     self.processOutputLine(line.rstrip())
[task 2017-04-27T14:48:19.241805Z]   File "/home/worker/workspace/mozharness/mozprocess/processhandler.py", line 697, in processOutputLine
[task 2017-04-27T14:48:19.241836Z]     handler(line)
[task 2017-04-27T14:48:19.241885Z]   File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 1387, in processOutput
[task 2017-04-27T14:48:19.241960Z]     parser.add_lines(line)
[task 2017-04-27T14:48:19.242070Z]   File "/home/worker/workspace/mozharness/mozharness/base/log.py", line 356, in add_lines
[task 2017-04-27T14:48:19.242124Z]     self.parse_single_line(line)
[task 2017-04-27T14:48:19.242182Z]   File "/home/worker/workspace/mozharness/mozharness/mozilla/structuredlog.py", line 90, in parse_single_line
[task 2017-04-27T14:48:19.242366Z]     level = error_check.get(level, getattr(log, data["level"].upper()))
[task 2017-04-27T14:48:19.242417Z] AttributeError: 'NoneType' object has no attribute 'get'

Flags: needinfo?(bugmail)

James Graham [:jgraham]

Comment 30

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cdb879bf2c4c82434ec143cfe3ef7ea572867b9c perhaps

James Graham [:jgraham]

Comment 31

•

8 years ago

So as far as I can tell the string you are matching never appears in the logs.

Also note that there's a slight issue with this approach; the error will be found by the treeherder log parser but won't end up  in the errorsummary.json file. If you wanted that you would have to change all the harnesses.

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 32

•

8 years ago

(In reply to James Graham [:jgraham] from comment #31)
> So as far as I can tell the string you are matching never appears in the
> logs.

So I tried a few different approaches to triggering the panic. All of them work and produce the string when I run the reftests locally, but they don't show up in the log I see from treeherder. Considering that the string is output by the firefox binary itself, I'm pretty convinced that it's getting lost or eaten somewhere along the way.

The way I'm running it locally is `mach reftest layout/reftests/reftest-sanity` (on a build with gfx.webrender.enabled=true), if that makes any difference.

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 33

•

7 years ago

Ok, I found another way to trigger the panic, and it is showing up in the raw log but not in the TH summary:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0164e3fed763e3d2f2bdf98f8677c86d6d131c8d&selectedJob=97110590

Flags: needinfo?(james)

Andrew Halberstadt [:ahal]

Assignee

Comment 34

•

7 years ago

I think I see what's going on. In bug 1344346 I started running 'process_output' actions through the error lists, but I made sure the base OutputParser class didn't do any actual printing because we still wanted to go through the proper formatter. Because I skipped this step, the 'process_output' still won't be printed with ERROR at the front. Essentially bug 1344346 succeeded in using 'process_output' to determine the worst_level (used for things like the overall return code of the run), but didn't modify the output at all.

I have a patch I'll run through try before submitting.

Flags: needinfo?(james)

Andrew Halberstadt [:ahal]

Assignee

Comment 35

•

7 years ago

This patch [1] seems to get it working. However, in my try run [2], the panic line reads:
GECKO(4580) | thread '<unnamed>' panicked at 'called `Result::unwrap()`

Which doesn't get matched by the '(?:\w+)' part of the regex. I'm not sure why it doesn't know the thread name. Simply using .* does produce an ERROR though. I think I'll file a new bug to get the patch in [1] landed, then we can continue using this one to get the proper regex added.

[1]: https://hg.mozilla.org/try/rev/5652f5690f71a09009d3f135fb8384838faf6abd
[2]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=50e486301204a7d5a86b401eefa4be6c2909323f&selectedJob=101589582

Andrew Halberstadt [:ahal]

Assignee

Updated

•

7 years ago

Depends on: 1367745

Andrew Halberstadt [:ahal]

Assignee

Comment 36

•

7 years ago

Btw, here's a try run using .+ instead, finally with the panicked line highlighted:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=45983434984766aa0f0aa1219a1d62c2da0f5789&selectedJob=102021643

I guess the original '([^']+)' would have worked too.. Either way, we can land whatever regex we want after bug 1367745 is fixed. Should have a patch up over there soon.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 37

•

7 years ago

\o/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/144534/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/144534/

Andrew Halberstadt [:ahal]

Assignee

Comment 39

•

7 years ago

Of course now that all the pieces are aligned, I wasn't able to reproduce the kernel panic to verify the fix:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=195c164fceeaa2f362e7224a2164df3e95432058

This was working in comment 36, though the regex there was slightly different. I tested this regex on the command line so I'm pretty confident it'll work as well.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Updated

•

7 years ago

Assignee: bugmail → ahalberstadt

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 40

•

7 years ago

mozreview-review

Comment on attachment 8873081 [details]
Bug 1343659 - [mozharness] Flag rust kernel panics as an ERROR,

https://reviewboard.mozilla.org/r/144534/#review148388

Ok. I'm happy with landing this. If I run across a problem with panics not being detected we can always revisit this.

Attachment #8873081 - Flags: review?(bugmail) → review+

Pulsebot

Comment 41

•

7 years ago

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/72dfa41fadc1
[mozharness] Flag rust kernel panics as an ERROR, r=kats

Ryan VanderMeulen [:RyanVM]

Comment 42

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/72dfa41fadc1

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 43

•

7 years ago

I have try pushes where this is verified working. Thanks :ahal!

Status: RESOLVED → VERIFIED

Ed Morley [:emorley]

Comment 44

•

7 years ago

Many thanks for persevering with this Andrew!

[treeherder] staktrace:rustpanic > mozilla:master 8 years ago GitHub Autolander Bot 47 bytes, text/x-github-pull-request		Details \| Review
Bug 1343659 - Add the thread panicked string 8 years ago Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed) 59 bytes, text/x-review-board-request		Details
! 8 years ago Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed) 59 bytes, text/x-review-board-request		Details
Bug 1343659 - [mozharness] Flag rust kernel panics as an ERROR, 7 years ago Andrew Halberstadt [:ahal] 59 bytes, text/x-review-board-request	kats : review+	Details