Jobs not updating their state after being classified
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P3)
Tracking
(Not tracked)
People
(Reporter: CosminS, Unassigned)
Details
Attachments
(2 files)
Encountered this bug a few times after classifying them, they kept being shown as unclassified just as in the attached pic.
Th push with the job: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=1e1617c67238dfb685fb3d07bf1793232c4469fa&selectedJob=276390902
Job id: https://firefox-ci-tc.services.mozilla.com/tasks/XO32mFGUTiWT1_qniBM4Tw
This might be a regression from Bug 1595902. Also Treeherder is painfully slow in picking up the new classifications.
Armen, could you please take a look over this? Thank you.
Comment 1•5 years ago
|
||
Bug 1595902 is completely unrelated.
Cosmin, how long after does it start showing up?
I'm unfamiliar with how classifications are processed in the backend and then shown in the UI.
camd: What does the classification pipeline look like?
Should we separate the dynos that process classifications? I wonder if slow downs in the log parsing dynos affects this.
Reporter | ||
Comment 2•5 years ago
|
||
Right now is working ok and the jobs are taking their state as classified as expected but a few hrs ago when I filed this it wasn't. I know the night shift before us complained about it as well, that TH was working generally slow. I'll try next time to make a recording to be more spot on with the exact behavior. Other sheriffs can pitch in if they want to.
Reporter | ||
Comment 3•5 years ago
•
|
||
Found another example of a job not taking it's classification: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=fd63a50a812ec869d3e17e86f9a89ed47c8f1671&selectedJob=276404070
It ended at 13:35 EET and was classified by Noemi at 13:37 EET and it is still shown as unclassified on the right part TH now, on the bottom left side it looks like it was already classified.
Comment 4•5 years ago
|
||
camd and I looked into this.
We are going to experiment using the failure classification value from the job_note rather than from the job table.
The two values have diverged for that job if. For unknown reasons.
Comment 5•5 years ago
|
||
As Armen said, we discussed it and have a plan in mind. Thanks Armen!!
Comment 6•5 years ago
|
||
Comment 7•5 years ago
|
||
Reporter | ||
Comment 8•5 years ago
•
|
||
A lot of jobs not updating their state in TH today: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=556e06b1b459b9884fd5490ba13bc255e833b84a&selectedJob=277122084
I'll leave some job id's: https://firefox-ci-tc.services.mozilla.com/tasks/SVhK1vmDROemMvMAQfEf2g
https://firefox-ci-tc.services.mozilla.com/tasks/PbKG0nxwRiOcljiWL9V0tw
https://firefox-ci-tc.services.mozilla.com/tasks/UlPUA2HtRIung0n6WceH0Q
All of the jobs in this pic are actually classified but TH still shows them like this: https://pasteboard.co/IHwYmUV.png
Also jobs from this range on autoland remained unclassified: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&tochange=0206c2d1aae8093ae2c4c625bf247448b28a86c3&fromchange=79821df172391d2d9ab224951b36bd8856df0fb1&selectedJob=277128583
I think this might be an issue related to the timezones and when exactly that fail is being classified:
for eg. take this failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&searchStr=windows%2C10%2Cx64%2Cquantumrender%2Cdebug%2Cmochitests%2Cwith%2Csocket%2Cprocess%2Ctest-windows10-64-qr%2Fdebug-mochitest-media-spi-e10s%2Cm-spi%28mda%29&revision=556e06b1b459b9884fd5490ba13bc255e833b84a&selectedJob=277124084
https://pasteboard.co/IHxitza.png
Started: Wed, Nov 20, 07:59:43
Ended: Wed, Nov 20, 08:41:20
and is shown in TH as being classified on Wed, Nov 20, 06:53:13
Armen, any thoughts on this?
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 9•4 years ago
|
||
Hi, this is still happening today, even though I haven't seen it in a while.
Comment 10•4 years ago
|
||
Armen's plate is full at the moment, so I'll look into this as soon as I wrap up a bug I'm working on.
Comment 11•4 years ago
|
||
Does this mismatched state occur after classifying via the bug filer or after pinning a job and classifying it against an existing bug (so you're manually clicking save or hitting 'enter')? An exact work flow where you have seen the errors would be helpful.
Armen and Cam, if you have any information you've discovered during your discussion of this bug that could be helpful, please let me know.
Updated•4 years ago
|
Reporter | ||
Comment 12•4 years ago
|
||
This hasn't happened in a while or at least I didn't hit it again. It normally happened after pinning a job and classifying it against an existing bug. There weren't any errors shown by TH, only that after pinning the job and hitting save they would still appear in TH webpage as being unclassified just like shown in the screenshots. Will come back with examples when this happens again.
Comment 13•4 years ago
|
||
Looking at the time that this bug was filed it happened the day after the DB got into a degraded state.
The core problem is that the job_note would get created, however, the column on the job row would not get updated.
Always reading the classification from the job_note instead of the job row is what camd and I came to agree.
Comment 14•4 years ago
•
|
||
(In reply to Cosmin Sabou [:CosminS] from comment #12)
This hasn't happened in a while or at least I didn't hit it again. It normally happened after pinning a job and classifying it against an existing bug. There weren't any errors shown by TH, only that after pinning the job and hitting save they would still appear in TH webpage as being unclassified just like shown in the screenshots. Will come back with examples when this happens again.
A few more questions: Do you remember if you still saw the green "classification" saved message during the times this issue appeared? Did it happen with just one pinned job or multiple or both? Did the unclassified count at the top change?
Comment 15•4 years ago
•
|
||
(In reply to Armen [:armenzg] from comment #13)
Looking at the time that this bug was filed it happened the day after the DB got into a degraded state.
The core problem is that the job_note would get created, however, the column on the job row would not get updated.
Always reading the classification from the job_note instead of the job row is what camd and I came to agree.
Could you elaborate? What I'm seeing in the code is that we are calling the the job note API with classification.create()
but the classification itself is not being read from those results - its based on the value in the job object. It appears the actual JobButton and JobGroup reads the classification from job.classification_id and when a classification changes, we're directly mutating the job object in the Pinboard component in the jobMap (created in the top level component) to change the classification (instead of updating the jobMap via redux): https://github.com/mozilla/treeherder/blob/master/ui/job-view/details/PinBoard.jsx#L123 So maybe this is where if we're receiving new info for jobs that don't have that classification, it's overriding the UI change (per your thought its a database issue)?
Another thought is that we are somehow not finding the jobInstance, so it doesn't update. I haven't been able to recreate this issue and I'm still looking through the code though, so that may not be it. Is there some sort of polling that happens?
Comment 16•4 years ago
|
||
@sclements I don't know this code myself. I just conveyed what I recalled from my conversation with Cam.
I know the UI is updated with the classification before the DB is update. When reading the jobs again from the DB the classification would go away since the Job row had not been updated correctly.
If this gets too complicated we could close the bug since this was a side effect of the DB degradation.
Comment 17•4 years ago
•
|
||
I spoke with Cam in the IRC channel and got a little more context into this. I'll look into creating the join on the note table. If that doesn't solve the issue, then we can re-open the bug and I can look a bit closer into what's happening in the UI that might be causing it to not update.
Comment 18•4 years ago
|
||
(In reply to Sarah Clements [:sclements] from comment #14)
(In reply to Cosmin Sabou [:CosminS] from comment #12)
This hasn't happened in a while or at least I didn't hit it again. It normally happened after pinning a job and classifying it against an existing bug. There weren't any errors shown by TH, only that after pinning the job and hitting save they would still appear in TH webpage as being unclassified just like shown in the screenshots. Will come back with examples when this happens again.
A few more questions: Do you remember if you still saw the green "classification" saved message during the times this issue appeared? Did it happen with just one pinned job or multiple or both? Did the unclassified count at the top change?
Hi
- the green "classification" saved message appears
- happens to one job, random job on random pushes (regardless if it's classified separately, or if there are multiple jobs selected to be classified)
- the unclassified number does not change
- job disappears only when page is refreshed (ctrl+f5)
Comment 19•4 years ago
|
||
Thanks Andreea.
Updated•4 years ago
|
Comment 20•3 years ago
|
||
I checked in with the sheriffs in the treeherder channel and :dluca said that this still happens occasionally.
Updated•2 years ago
|
Description
•