Closed
Bug 716800
Opened 13 years ago
Closed 11 years ago
"talosError: Found processes still running: .*. Please close them before running talos" should set RETRY
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: philor, Unassigned)
References
Details
(Whiteboard: [automation])
Whether it's dwwin (bug 703996) or firefox (bug 704380) or plugin-container (bug 714655), for releng's purposes the real meaning of "Found processes still running" is "something broke the last run so it failed to reboot" to which the solution is "this time we'll reboot, and when the run is manually retriggered it will be fine" so instead of manually retriggering, we should be automatically RETRYing.
Comment 1•13 years ago
|
||
I'm fuzzy on the details here: does setting RETRY on the releng side cause the entire build to be re-run (implying a reboot), or is only the step in question retried?
philor: do you still want the individual bugs mentioned in comment #0 left open for tracking frequency (vs. DUPing them to this bug)?
Priority: -- → P3
Whiteboard: [orange][automation]
Reporter | ||
Comment 2•13 years ago
|
||
Nobody is fuzzier than me, nobody! but what I meant was http://mxr.mozilla.org/build/source/buildbotcustom/status/errors.py#5, since this is pretty much the same sort of thing as those Tegra failures - they are bugs, each their own separate snowflake of failure, but when you are talking about a particular run that hit them, that slave should go reboot, and another slave should be given the job to do it right.
dwwin is certainly an entirely separate bug - under no circumstances should any slave taking a job have it running; 704380 seems to me to be a bug in the script that runs Jetpack, or in the Talos process-finder, or in hdiutil, hello pain, but still probably a bug that we want to stop at the source, rather than just sweep it away by trying another slave and hoping that another reboot will make it go away; the plugin-container one I have absolutely no feeling about, no idea where that came from.
Comment 3•12 years ago
|
||
Mass marking whiteboard:[orange] bugs WFM (to clean up TBPL bug suggestions) that:
* Haven't changed in > 6months
* Whose whiteboard contains none of the strings: {disabled,marked,random,fuzzy,todo,fails,failing,annotated,leave open,time-bomb}
* Passed a (quick) manual inspection of bug summary/whiteboard to ensure they weren't a false positive.
I've also gone through and searched for cases where the whiteboard wasn't labelled correctly after test disabling, by using attachment description & basic comment searches. However if the test for which this bug was about has in fact been disabled/annotated/..., please accept my apologies & reopen/mark the whiteboard appropriately so this doesn't get re-closed in the future (and please ping me via IRC or email so I can try to tweak the saved searches to avoid more edge cases).
Sorry for the spam! Filter on: #FFA500
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Updated•12 years ago
|
No longer blocks: 438871
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [orange][automation] → [automation]
Reporter | ||
Comment 4•12 years ago
|
||
At least I think the message has now changed from FAIL: to talosError:.
Summary: "FAIL: Found processes still running: .*. Please close them before running talos" should set RETRY → "talosError: Found processes still running: .*. Please close them before running talos" should set RETRY
Comment 5•12 years ago
|
||
Comment 6•12 years ago
|
||
Comment 7•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Comment 9•11 years ago
|
||
The error still exists in talos code, and given a situation where it would be raised we should set retry, so it's valid in that sense, but either talos is broken so it doesn't notice running processes, or we've gotten to the point where we really never do let an unrebooted slave take a job, so there hasn't been anything to retry for months.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Flags: needinfo?(philringnalda)
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•