Open Bug 1596689 Opened 5 years ago Updated 2 years ago

Jobs not updating their state after being classified

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: CosminS, Unassigned)

Details

Attachments

(2 files)

Attached image image.png

Encountered this bug a few times after classifying them, they kept being shown as unclassified just as in the attached pic.

Th push with the job: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=1e1617c67238dfb685fb3d07bf1793232c4469fa&selectedJob=276390902

Job id: https://firefox-ci-tc.services.mozilla.com/tasks/XO32mFGUTiWT1_qniBM4Tw

This might be a regression from Bug 1595902. Also Treeherder is painfully slow in picking up the new classifications.
Armen, could you please take a look over this? Thank you.

Flags: needinfo?(armenzg)

Bug 1595902 is completely unrelated.

Cosmin, how long after does it start showing up?

I'm unfamiliar with how classifications are processed in the backend and then shown in the UI.

camd: What does the classification pipeline look like?
Should we separate the dynos that process classifications? I wonder if slow downs in the log parsing dynos affects this.

Flags: needinfo?(cdawson)

Right now is working ok and the jobs are taking their state as classified as expected but a few hrs ago when I filed this it wasn't. I know the night shift before us complained about it as well, that TH was working generally slow. I'll try next time to make a recording to be more spot on with the exact behavior. Other sheriffs can pitch in if they want to.

Found another example of a job not taking it's classification: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=fd63a50a812ec869d3e17e86f9a89ed47c8f1671&selectedJob=276404070
It ended at 13:35 EET and was classified by Noemi at 13:37 EET and it is still shown as unclassified on the right part TH now, on the bottom left side it looks like it was already classified.

camd and I looked into this.

We are going to experiment using the failure classification value from the job_note rather than from the job table.
The two values have diverged for that job if. For unknown reasons.

Flags: needinfo?(armenzg)

As Armen said, we discussed it and have a plan in mind. Thanks Armen!!

Flags: needinfo?(cdawson)

A lot of jobs not updating their state in TH today: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=556e06b1b459b9884fd5490ba13bc255e833b84a&selectedJob=277122084

I'll leave some job id's: https://firefox-ci-tc.services.mozilla.com/tasks/SVhK1vmDROemMvMAQfEf2g
https://firefox-ci-tc.services.mozilla.com/tasks/PbKG0nxwRiOcljiWL9V0tw
https://firefox-ci-tc.services.mozilla.com/tasks/UlPUA2HtRIung0n6WceH0Q

All of the jobs in this pic are actually classified but TH still shows them like this: https://pasteboard.co/IHwYmUV.png

Also jobs from this range on autoland remained unclassified: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&tochange=0206c2d1aae8093ae2c4c625bf247448b28a86c3&fromchange=79821df172391d2d9ab224951b36bd8856df0fb1&selectedJob=277128583

I think this might be an issue related to the timezones and when exactly that fail is being classified:
for eg. take this failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&searchStr=windows%2C10%2Cx64%2Cquantumrender%2Cdebug%2Cmochitests%2Cwith%2Csocket%2Cprocess%2Ctest-windows10-64-qr%2Fdebug-mochitest-media-spi-e10s%2Cm-spi%28mda%29&revision=556e06b1b459b9884fd5490ba13bc255e833b84a&selectedJob=277124084

https://pasteboard.co/IHxitza.png
Started: Wed, Nov 20, 07:59:43
Ended: Wed, Nov 20, 08:41:20
and is shown in TH as being classified on Wed, Nov 20, 06:53:13
Armen, any thoughts on this?

Flags: needinfo?(armenzg)
Assignee: nobody → armenzg
Flags: needinfo?(armenzg)
Priority: -- → P2

Hi, this is still happening today, even though I haven't seen it in a while.

Armen's plate is full at the moment, so I'll look into this as soon as I wrap up a bug I'm working on.

Assignee: armenzg → sclements
Status: NEW → ASSIGNED

Does this mismatched state occur after classifying via the bug filer or after pinning a job and classifying it against an existing bug (so you're manually clicking save or hitting 'enter')? An exact work flow where you have seen the errors would be helpful.

Armen and Cam, if you have any information you've discovered during your discussion of this bug that could be helpful, please let me know.

Priority: P2 → P1

This hasn't happened in a while or at least I didn't hit it again. It normally happened after pinning a job and classifying it against an existing bug. There weren't any errors shown by TH, only that after pinning the job and hitting save they would still appear in TH webpage as being unclassified just like shown in the screenshots. Will come back with examples when this happens again.

Looking at the time that this bug was filed it happened the day after the DB got into a degraded state.

The core problem is that the job_note would get created, however, the column on the job row would not get updated.

Always reading the classification from the job_note instead of the job row is what camd and I came to agree.

(In reply to Cosmin Sabou [:CosminS] from comment #12)

This hasn't happened in a while or at least I didn't hit it again. It normally happened after pinning a job and classifying it against an existing bug. There weren't any errors shown by TH, only that after pinning the job and hitting save they would still appear in TH webpage as being unclassified just like shown in the screenshots. Will come back with examples when this happens again.

A few more questions: Do you remember if you still saw the green "classification" saved message during the times this issue appeared? Did it happen with just one pinned job or multiple or both? Did the unclassified count at the top change?

(In reply to Armen [:armenzg] from comment #13)

Looking at the time that this bug was filed it happened the day after the DB got into a degraded state.

The core problem is that the job_note would get created, however, the column on the job row would not get updated.

Always reading the classification from the job_note instead of the job row is what camd and I came to agree.

Could you elaborate? What I'm seeing in the code is that we are calling the the job note API with classification.create() but the classification itself is not being read from those results - its based on the value in the job object. It appears the actual JobButton and JobGroup reads the classification from job.classification_id and when a classification changes, we're directly mutating the job object in the Pinboard component in the jobMap (created in the top level component) to change the classification (instead of updating the jobMap via redux): https://github.com/mozilla/treeherder/blob/master/ui/job-view/details/PinBoard.jsx#L123 So maybe this is where if we're receiving new info for jobs that don't have that classification, it's overriding the UI change (per your thought its a database issue)?

Another thought is that we are somehow not finding the jobInstance, so it doesn't update. I haven't been able to recreate this issue and I'm still looking through the code though, so that may not be it. Is there some sort of polling that happens?

@sclements I don't know this code myself. I just conveyed what I recalled from my conversation with Cam.
I know the UI is updated with the classification before the DB is update. When reading the jobs again from the DB the classification would go away since the Job row had not been updated correctly.

If this gets too complicated we could close the bug since this was a side effect of the DB degradation.

I spoke with Cam in the IRC channel and got a little more context into this. I'll look into creating the join on the note table. If that doesn't solve the issue, then we can re-open the bug and I can look a bit closer into what's happening in the UI that might be causing it to not update.

(In reply to Sarah Clements [:sclements] from comment #14)

(In reply to Cosmin Sabou [:CosminS] from comment #12)

This hasn't happened in a while or at least I didn't hit it again. It normally happened after pinning a job and classifying it against an existing bug. There weren't any errors shown by TH, only that after pinning the job and hitting save they would still appear in TH webpage as being unclassified just like shown in the screenshots. Will come back with examples when this happens again.

A few more questions: Do you remember if you still saw the green "classification" saved message during the times this issue appeared? Did it happen with just one pinned job or multiple or both? Did the unclassified count at the top change?

Hi

  • the green "classification" saved message appears
  • happens to one job, random job on random pushes (regardless if it's classified separately, or if there are multiple jobs selected to be classified)
  • the unclassified number does not change
  • job disappears only when page is refreshed (ctrl+f5)

Thanks Andreea.

Priority: P1 → P3

I checked in with the sheriffs in the treeherder channel and :dluca said that this still happens occasionally.

Assignee: sclements → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: