Open Bug 1558594 Opened 5 years ago Updated 3 years ago

Update Treeherder to reflect new known_intermittent field in Mozlog

Categories

(Tree Management :: Treeherder, task, P5)

Tracking

(Not tracked)

People

(Reporter: nikkis, Unassigned, Mentored)

References

Details

Hello!

Just a friendly update on some mozilla-central changes that may later affect Treeherder. Mozlog has undergone some changes to make way for a new strategy to flag known intermittent test statuses.

In Mozlog's StructuredLogger, actions test_status and test_end have a new field called known_intermittent which takes the form of a list. eg. ["FAIL", "CRASH"] Formatters have been updated to reflect this new field and record test statuses that are deemed known_intermittent for that test ID as expected. As such, some elements of Treeherder may need adjusting to ensure no breakages occur when this field is used.

I have added all changes made to the "blocks" section. The changes most relevant to TreeHerder can be found here:
https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/structuredlog.py#39
https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/formatters/tbplformatter.py

Because Mozlog's TbplFormatter is utilised here [1], the FailureLine class [2] may need updating to add known_intermittent as a field.
[1] https://github.com/mozilla/treeherder/blob/f9be6e0b24cb6b19e4aeb55a50e136039838a990/treeherder/log_parser/crossreference.py
[2] https://github.com/mozilla/treeherder/blob/c23fafb518ba237d69438dd0f189ef780d137110/treeherder/model/models.py

Could you please advise if this suggestion is correct? In your opinion, do you think any other elements will be affected? I have sifted through the code base and do not think the ErrorParser or anything else should break, but am unsure if the front-end could also be updated accordingly... Any advice on how to proceed would be most welcome. Thank you in advance for your time!

Cheers,
Nikki (Outreachy participant)

ETA: @camd I hear you are the expert to ask! Thanks in advance!

Flags: needinfo?(cdawson)

At first blush, it doesn't sound like this would break anything. We will just ignore the new field. But it would be great to make use of it. I'd like to ingest that value and add it to the FailureLine model for use in Push Health and perhaps elsewhere.

I'll do a little more investigation on this tomorrow to see if I can spot any potential gotchas. Thanks! :)

Flags: needinfo?(cdawson)

That would be great! Let me know if/how I can add that for you. The code is easily added, but I have not yet dealt with migrations and Mozilla. There is a little info on how to at https://treeherder.readthedocs.io/installation.html, but I certainly wouldn't want to break anything :)

Thank you for the speedy reply!

I'm finally circling back around to this. So we will ignore the new field you're adding. The only fields we ingest are these:

https://github.com/mozilla/treeherder/blob/3240c598883934ba1082309301a10456402a2fa6/treeherder/log_parser/failureline.py#L91

If you want to modify our code/schema so that we ingest that new field, that'd be great. But it may be a slow migration to add that field. So when we push it to production, we may need to do it on the weekend or some other off-peak time.

So to add that field to the FailureLine model, you'd do that in https://github.com/mozilla/treeherder/blob/c23fafb518ba237d69438dd0f189ef780d137110/treeherder/model/models.py#L947

And then you'd need to create a django migration for that. You'd be able to test this all in a local instance of Treeherder to ensure things get imported correctly. Please see https://treeherder.readthedocs.io/installation.html

I hope this helps! :)

Flags: needinfo?(cdawson)
Assignee: nobody → nsharpley

Ah, I suspected as much. Thank you for taking a look! I will working on adding that new field. Cheers. :)

Component: Treeherder → Treeherder: Log Parsing & Classification
Priority: -- → P5
Assignee: nikkisharpley → nobody
Component: Treeherder: Log Parsing & Classification → Database

I would like to work on this Sir. Please guide me on this.

Hello sir. Would love to give this a try. Can I be assigned this bug?

Flags: needinfo?(klahnakoski)

@Shivansh Srivastava:

Here is a structured log:
https://firefoxci.taskcluster-artifacts.net/PINeP5a5RNqNFuFrs8GXJg/0/public/test_info//mochitest-plain_raw.log
Here is a structured log with failures:
https://firefoxci.taskcluster-artifacts.net/DNdpLyF9TR-EPFTrFAuimA/0/public/test_info/mochitest-plain_raw.log

The challenge will be to find an example log to test processing known_intermittent. I looked around a bit, and camd has pointed out the correct code places to start looking, but that is no help to find an example.

You may consider updating the ActiveData-ETL pipeline: It processes all the tests in detail. If you capture the new known_intermittent property with ActiveData, then we can make an ActiveData query to find examples where it is used, and provide a test for the Treeherder code.

Be sure to submit your PR to the etl branch (the branch deployed to production).

Here is the main transform routine
https://github.com/klahnakoski/ActiveData-ETL/blob/f68fe74915c219d953a7d901f802688d50d3815c/activedata_etl/transforms/unittest_logs_to_sink.py#L42

There are a number of methods responsible for processing the various log messages here is the code for testt_end
https://github.com/klahnakoski/ActiveData-ETL/blob/f68fe74915c219d953a7d901f802688d50d3815c/activedata_etl/transforms/unittest_logs_to_sink.py#L342

Flags: needinfo?(klahnakoski)
Component: Database → TreeHerder
You need to log in before you can comment on or make changes to this bug.