Update Treeherder to reflect new known_intermittent field in Mozlog
Categories
(Tree Management :: Treeherder, task, P5)
Tracking
(Not tracked)
People
(Reporter: nikkis, Unassigned, Mentored)
References
Details
Hello!
Just a friendly update on some mozilla-central changes that may later affect Treeherder. Mozlog has undergone some changes to make way for a new strategy to flag known intermittent test statuses.
In Mozlog's StructuredLogger
, actions test_status
and test_end
have a new field called known_intermittent
which takes the form of a list. eg. ["FAIL", "CRASH"]
Formatters have been updated to reflect this new field and record test statuses that are deemed known_intermittent
for that test ID as expected
. As such, some elements of Treeherder may need adjusting to ensure no breakages occur when this field is used.
I have added all changes made to the "blocks" section. The changes most relevant to TreeHerder can be found here:
https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/structuredlog.py#39
https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/formatters/tbplformatter.py
Because Mozlog's TbplFormatter is utilised here [1], the FailureLine
class [2] may need updating to add known_intermittent
as a field.
[1] https://github.com/mozilla/treeherder/blob/f9be6e0b24cb6b19e4aeb55a50e136039838a990/treeherder/log_parser/crossreference.py
[2] https://github.com/mozilla/treeherder/blob/c23fafb518ba237d69438dd0f189ef780d137110/treeherder/model/models.py
Could you please advise if this suggestion is correct? In your opinion, do you think any other elements will be affected? I have sifted through the code base and do not think the ErrorParser or anything else should break, but am unsure if the front-end could also be updated accordingly... Any advice on how to proceed would be most welcome. Thank you in advance for your time!
Cheers,
Nikki (Outreachy participant)
ETA: @camd I hear you are the expert to ask! Thanks in advance!
Comment 1•5 years ago
|
||
At first blush, it doesn't sound like this would break anything. We will just ignore the new field. But it would be great to make use of it. I'd like to ingest that value and add it to the FailureLine model for use in Push Health and perhaps elsewhere.
I'll do a little more investigation on this tomorrow to see if I can spot any potential gotchas. Thanks! :)
Reporter | ||
Comment 2•5 years ago
|
||
That would be great! Let me know if/how I can add that for you. The code is easily added, but I have not yet dealt with migrations and Mozilla. There is a little info on how to at https://treeherder.readthedocs.io/installation.html, but I certainly wouldn't want to break anything :)
Thank you for the speedy reply!
Comment 3•5 years ago
|
||
I'm finally circling back around to this. So we will ignore the new field you're adding. The only fields we ingest are these:
If you want to modify our code/schema so that we ingest that new field, that'd be great. But it may be a slow migration to add that field. So when we push it to production, we may need to do it on the weekend or some other off-peak time.
So to add that field to the FailureLine model, you'd do that in https://github.com/mozilla/treeherder/blob/c23fafb518ba237d69438dd0f189ef780d137110/treeherder/model/models.py#L947
And then you'd need to create a django migration for that. You'd be able to test this all in a local instance of Treeherder to ensure things get imported correctly. Please see https://treeherder.readthedocs.io/installation.html
I hope this helps! :)
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 4•5 years ago
|
||
Ah, I suspected as much. Thank you for taking a look! I will working on adding that new field. Cheers. :)
Updated•5 years ago
|
Updated•5 years ago
|
Comment 5•5 years ago
|
||
I would like to work on this Sir. Please guide me on this.
Comment 6•5 years ago
|
||
Hello sir. Would love to give this a try. Can I be assigned this bug?
Comment 7•5 years ago
|
||
@Shivansh Srivastava:
Here is a structured log:
https://firefoxci.taskcluster-artifacts.net/PINeP5a5RNqNFuFrs8GXJg/0/public/test_info//mochitest-plain_raw.log
Here is a structured log with failures:
https://firefoxci.taskcluster-artifacts.net/DNdpLyF9TR-EPFTrFAuimA/0/public/test_info/mochitest-plain_raw.log
The challenge will be to find an example log to test processing known_intermittent
. I looked around a bit, and camd has pointed out the correct code places to start looking, but that is no help to find an example.
You may consider updating the ActiveData-ETL pipeline: It processes all the tests in detail. If you capture the new known_intermittent
property with ActiveData, then we can make an ActiveData query to find examples where it is used, and provide a test for the Treeherder code.
Be sure to submit your PR to the etl
branch (the branch deployed to production).
Here is the main transform routine
https://github.com/klahnakoski/ActiveData-ETL/blob/f68fe74915c219d953a7d901f802688d50d3815c/activedata_etl/transforms/unittest_logs_to_sink.py#L42
There are a number of methods responsible for processing the various log messages here is the code for testt_end
https://github.com/klahnakoski/ActiveData-ETL/blob/f68fe74915c219d953a7d901f802688d50d3815c/activedata_etl/transforms/unittest_logs_to_sink.py#L342
Assignee | ||
Updated•3 years ago
|
Description
•