Closed
Bug 1343659
Opened 7 years ago
Closed 7 years ago
Mozharness should use log level ERROR for rust panics lines
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(firefox55 fixed)
VERIFIED
FIXED
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: kats, Assigned: ahal)
References
Details
Attachments
(3 files, 1 obsolete file)
I would like lines like [1] to be recognized as errors in treeherder so that they show up in the failure summary and can be used to inform starring of bugs. I wrote a patch but I don't think I can test it easily. [1] https://treeherder.mozilla.org/logviewer.html#?job_id=79852537&repo=graphics&lineNumber=9313
Comment 1•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Attachment #8842588 -
Flags: review?(wlachance)
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → bugmail
Comment 2•7 years ago
|
||
Thank you for the PR! Before we try this approach, would you mind seeing if mozharness can be made to output these lines with a log level of ERROR not INFO? (Since Treeherder already recognises mozharness ERROR prefixes) ie at the moment the line is incorrectly using INFO: [task 2017-02-24T04:20:20.368227Z] 04:20:20 INFO - ... We're trying to move away from having to hardcode every variation of error string in the log parser where possible.
Comment 3•7 years ago
|
||
Comment on attachment 8842588 [details] [review] [treeherder] staktrace:rustpanic > mozilla:master Ed's suggestion would be ideal if it's not too difficult (unfortunately I suspect it is). Otherwise I think this should be fine to go in with a unit test (see PR for details).
Attachment #8842588 -
Flags: review?(wlachance)
Comment 4•7 years ago
|
||
I think it should be possible via: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/errors.py
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Reporter | ||
Comment 7•7 years ago
|
||
Thanks all! I'll try to do the change in mozharness along the lines of what Wes posted in his patches. Although his patch seems to modify the "logcat" error strings which sounds like it's Android-specific?
Comment 8•7 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7) > Thanks all! I'll try to do the change in mozharness along the lines of what > Wes posted in his patches. Although his patch seems to modify the "logcat" > error strings which sounds like it's Android-specific? Yeah that doesn't seem right to me. Maybe we need some kind of new category for these types of error strings.
Suppose I should have pushed that with http://imgur.com/a/cp4ky
Reporter | ||
Comment 10•7 years ago
|
||
Well, I tried the patch at [1], and the job at [2] has the rust panic line in the output but it didn't get flagged as an error. Is there a way to test this locally? Trial-and-error with try pushes is going to be quite painful. [1] https://hg.mozilla.org/try/rev/3a255d30ece138d2bab3072dc9069a51a15f6db7 [2] https://treeherder.mozilla.org/logviewer.html#?job_id=81233456&repo=try&lineNumber=2289
Comment 11•7 years ago
|
||
Andrew, can you help Kats with this? See comment 10.
Flags: needinfo?(ahalberstadt)
Updated•7 years ago
|
Component: Treeherder: Log Viewer → Mozharness
Product: Tree Management → Release Engineering
QA Contact: jlund
Summary: TreeHerder should recognize rust panics as errors → Mozharness should use log level ERROR for rust panics lines
Updated•7 years ago
|
Attachment #8842588 -
Attachment is obsolete: true
Assignee | ||
Comment 12•7 years ago
|
||
Kats, I think you have it right but it looks like your regex doesn't match because the thread name is single-quoted. Try using: thread '(\w+)' panicked Unfortunately there's not a great way of testing this locally. I usually write a script that manually imports and mocks out an OutputParser object, and pipe logs into it from my mach runs.. But it's not really more convenient than pushing to try.
Flags: needinfo?(ahalberstadt)
Reporter | ||
Comment 13•7 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #12) > Kats, I think you have it right but it looks like your regex doesn't match > because the thread name is single-quoted. Try using: > thread '(\w+)' panicked That didn't work either. Try push at [1], log at [2]. I verified that the regex matches by running this in my shell: python -c "import re; print re.compile(r'''thread '([^']+)' panicked''').match(r'''thread 'RenderBackend' panicked at 'called''').group(0)" [1] https://hg.mozilla.org/try/rev/f85149064dc88f99baf8e57d322bcc6b20df959f [2] https://treeherder.mozilla.org/logviewer.html#?job_id=81315988&repo=try&lineNumber=2252
Assignee | ||
Comment 14•7 years ago
|
||
Ok, I see why adding a regex to the errorlist isn't working. That log message is being logged via the 'info' level from the structured logger. For example, check out the raw log: https://public-artifacts.taskcluster.net/fHFkweGmTOOzETCWHMbJuw/0/public/test_info//browser-chrome-gpu_raw.log Now ctrl-f for 'panicked' and you'll notice it's a 'log' action with 'level': 'info'. Mozharness/treeherder will automatically highlight anything logged with 'error' or above. The 'errorlist' regexes are only used to catch errors in unstructured output. Do you happen to know where that rust output is being logged from?
Reporter | ||
Comment 15•7 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #14) > Do you happen to know where that rust output > is being logged from? The rust output is spit out directly from the firefox binary to stderr, as far as I know. I'm not sure what all the different harness layers are doing with the output along the way.
Assignee | ||
Comment 16•7 years ago
|
||
I spy some mochitest stupidity.. I'm going to file a bug that blocks this, though it might take a bit of time to get right. So if you're in a rush to get this landed and wlach/emorley are ok with it, I'd recommend landing your original patch if possible. Ed, would it be alright to re-open/land the original pull request for now?
Flags: needinfo?(emorley)
Reporter | ||
Comment 17•7 years ago
|
||
I'm not in a particular rush to get this landed. I can wait a bit.
Flags: needinfo?(emorley)
Reporter | ||
Comment 18•7 years ago
|
||
Now that bug 1344346 is in m-c, I'll try the patch at [1] again. Let me know if there's more things that need to be fixed first. [1] https://hg.mozilla.org/try/rev/f85149064dc88f99baf8e57d322bcc6b20df959f
Reporter | ||
Comment 19•7 years ago
|
||
Still not working: https://treeherder.mozilla.org/logviewer.html#?job_id=83460775&repo=try&lineNumber=17598 The same log later has TEST-UNEXPECTED and PROCESS-CRASH, both of which are also "INFO" level despite already being listed as ERROR-level substrings at https://hg.mozilla.org/try/rev/9470d7dab8d2cc5914763a5dbd5b5a5f18e45b0d#l1.10
Assignee | ||
Comment 21•7 years ago
|
||
Crap, sorry.. thought that would fix it. I'm PTO until the end of May, so re-directing to jgraham.
Flags: needinfo?(ahalberstadt) → needinfo?(james)
Comment 22•7 years ago
|
||
Right, this is being logged as process_output from the reftest harness. Which seems prety reasonable, but I think you now need to land your changes in https://hg.mozilla.org/try/rev/3a255d30ece138d2bab3072dc9069a51a15f6db7 so that mozharness flags this process_output as an error. let me know if that doesn't work.
Flags: needinfo?(james)
Reporter | ||
Comment 23•7 years ago
|
||
If I understand you correctly, that's exactly what I did in the try push in comment 19, but it still didn't work. Or has more stuff landed in terms of changing the logging since then? I've kicked off another try push anyway, just to confirm: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf051186a554bdc95d6354907f0ecd014300f5ac
Comment 24•7 years ago
|
||
Ah, sorry I forget that people put actual code in the try-cyntax commit (mach try is your friend!). I'll investigate again.
Flags: needinfo?(james)
Reporter | ||
Comment 25•7 years ago
|
||
Yeah it's definitely still not working: https://treeherder.mozilla.org/logviewer.html#?job_id=86264921&repo=try&lineNumber=17388
Reporter | ||
Comment 26•7 years ago
|
||
Any update here, James?
Comment 27•7 years ago
|
||
So the StruturedOutputParser wasn't using this code at all for lines with action="log". The branch in https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b689b80021d22680b2f0da99cc36dfcc0d4f0f6 changes that, which I think will fix this case (the old logs were lost so I can't easily work out if this applies to the original problem). Can you rerun something that is expected to fail on top of the above changes?
Flags: needinfo?(james)
Updated•7 years ago
|
Flags: needinfo?(bugmail)
Reporter | ||
Comment 28•7 years ago
|
||
I kicked off a try push with your patches from the try run above plus a webrender change that should trigger a panic. https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3fc73392c4d94eab0103b82b76178e4ddb29f3a Leaving needinfo on me until I confirm the panic-inducing change works as intended :)
Reporter | ||
Comment 29•7 years ago
|
||
Oh.. it looks like there's an error in one your patches: [task 2017-04-27T14:48:19.241204Z] Exception in thread Thread-1: [task 2017-04-27T14:48:19.241282Z] Traceback (most recent call last): [task 2017-04-27T14:48:19.241410Z] File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner [task 2017-04-27T14:48:19.241448Z] self.run() [task 2017-04-27T14:48:19.241490Z] File "/usr/lib/python2.7/threading.py", line 754, in run [task 2017-04-27T14:48:19.241621Z] self.__target(*self.__args, **self.__kwargs) [task 2017-04-27T14:48:19.241681Z] File "/home/worker/workspace/mozharness/mozprocess/processhandler.py", line 733, in _processOutput [task 2017-04-27T14:48:19.241720Z] self.processOutputLine(line.rstrip()) [task 2017-04-27T14:48:19.241805Z] File "/home/worker/workspace/mozharness/mozprocess/processhandler.py", line 697, in processOutputLine [task 2017-04-27T14:48:19.241836Z] handler(line) [task 2017-04-27T14:48:19.241885Z] File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 1387, in processOutput [task 2017-04-27T14:48:19.241960Z] parser.add_lines(line) [task 2017-04-27T14:48:19.242070Z] File "/home/worker/workspace/mozharness/mozharness/base/log.py", line 356, in add_lines [task 2017-04-27T14:48:19.242124Z] self.parse_single_line(line) [task 2017-04-27T14:48:19.242182Z] File "/home/worker/workspace/mozharness/mozharness/mozilla/structuredlog.py", line 90, in parse_single_line [task 2017-04-27T14:48:19.242366Z] level = error_check.get(level, getattr(log, data["level"].upper())) [task 2017-04-27T14:48:19.242417Z] AttributeError: 'NoneType' object has no attribute 'get'
Flags: needinfo?(bugmail)
Comment 30•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cdb879bf2c4c82434ec143cfe3ef7ea572867b9c perhaps
Comment 31•7 years ago
|
||
So as far as I can tell the string you are matching never appears in the logs. Also note that there's a slight issue with this approach; the error will be found by the treeherder log parser but won't end up in the errorsummary.json file. If you wanted that you would have to change all the harnesses.
Flags: needinfo?(bugmail)
Reporter | ||
Comment 32•7 years ago
|
||
(In reply to James Graham [:jgraham] from comment #31) > So as far as I can tell the string you are matching never appears in the > logs. So I tried a few different approaches to triggering the panic. All of them work and produce the string when I run the reftests locally, but they don't show up in the log I see from treeherder. Considering that the string is output by the firefox binary itself, I'm pretty convinced that it's getting lost or eaten somewhere along the way. The way I'm running it locally is `mach reftest layout/reftests/reftest-sanity` (on a build with gfx.webrender.enabled=true), if that makes any difference.
Flags: needinfo?(bugmail)
Reporter | ||
Comment 33•7 years ago
|
||
Ok, I found another way to trigger the panic, and it is showing up in the raw log but not in the TH summary: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0164e3fed763e3d2f2bdf98f8677c86d6d131c8d&selectedJob=97110590
Flags: needinfo?(james)
Assignee | ||
Comment 34•7 years ago
|
||
I think I see what's going on. In bug 1344346 I started running 'process_output' actions through the error lists, but I made sure the base OutputParser class didn't do any actual printing because we still wanted to go through the proper formatter. Because I skipped this step, the 'process_output' still won't be printed with ERROR at the front. Essentially bug 1344346 succeeded in using 'process_output' to determine the worst_level (used for things like the overall return code of the run), but didn't modify the output at all. I have a patch I'll run through try before submitting.
Flags: needinfo?(james)
Assignee | ||
Comment 35•7 years ago
|
||
This patch [1] seems to get it working. However, in my try run [2], the panic line reads: GECKO(4580) | thread '<unnamed>' panicked at 'called `Result::unwrap()` Which doesn't get matched by the '(?:\w+)' part of the regex. I'm not sure why it doesn't know the thread name. Simply using .* does produce an ERROR though. I think I'll file a new bug to get the patch in [1] landed, then we can continue using this one to get the proper regex added. [1]: https://hg.mozilla.org/try/rev/5652f5690f71a09009d3f135fb8384838faf6abd [2]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=50e486301204a7d5a86b401eefa4be6c2909323f&selectedJob=101589582
Assignee | ||
Comment 36•7 years ago
|
||
Btw, here's a try run using .+ instead, finally with the panicked line highlighted: https://treeherder.mozilla.org/#/jobs?repo=try&revision=45983434984766aa0f0aa1219a1d62c2da0f5789&selectedJob=102021643 I guess the original '([^']+)' would have worked too.. Either way, we can land whatever regex we want after bug 1367745 is fixed. Should have a patch up over there soon.
Reporter | ||
Comment 37•7 years ago
|
||
\o/
Comment hidden (mozreview-request) |
Assignee | ||
Comment 39•7 years ago
|
||
Of course now that all the pieces are aligned, I wasn't able to reproduce the kernel panic to verify the fix: https://treeherder.mozilla.org/#/jobs?repo=try&revision=195c164fceeaa2f362e7224a2164df3e95432058 This was working in comment 36, though the regex there was slightly different. I tested this regex on the command line so I'm pretty confident it'll work as well.
Reporter | ||
Updated•7 years ago
|
Assignee: bugmail → ahalberstadt
Reporter | ||
Comment 40•7 years ago
|
||
mozreview-review |
Comment on attachment 8873081 [details] Bug 1343659 - [mozharness] Flag rust kernel panics as an ERROR, https://reviewboard.mozilla.org/r/144534/#review148388 Ok. I'm happy with landing this. If I run across a problem with panics not being detected we can always revisit this.
Attachment #8873081 -
Flags: review?(bugmail) → review+
Comment 41•7 years ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/72dfa41fadc1 [mozharness] Flag rust kernel panics as an ERROR, r=kats
Comment 42•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/72dfa41fadc1
Reporter | ||
Comment 43•7 years ago
|
||
I have try pushes where this is verified working. Thanks :ahal!
Status: RESOLVED → VERIFIED
Comment 44•7 years ago
|
||
Many thanks for persevering with this Andrew!
You need to log in
before you can comment on or make changes to this bug.
Description
•