Closed
Bug 1148860
Opened 9 years ago
Closed 9 years ago
queue: Allow artifacts up to 25min past a run is resolved as exception (for logs)
Categories
(Taskcluster :: Services, defect, P2)
Taskcluster
Services
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jonasfj, Assigned: jonasfj)
References
Details
Attachments
(1 file)
If a run is resolved as exception we should still allow artifact upload until 25min past the resolution time. At the moment the queue will reject any attempts to upload artifacts after a run is resolved. This is a special case for "exception", because we still want logs, but we accept that they are best-effort when we have an exception. Use-cases: A) docker-worker encounters a spot termination warning from EC2. Instead of uploading logs and then reportException(worker-shutdown) We should reportException(worker-shutdown) and then upload logs, this way if we are terminated while uploading a large log file, we will still have reported exception with worker-shutdown, so the queue will have scheduled a new rerun immediately. B) A user encounters something weird in the live-log from a task and cancels the task. The livelog will not be persisted, so when the user reports the issue to us, it'll be hard to debug :) As docker-worker will reclaimTask every 20min, it should be able to detect the 409 and upload logs. Note, that docker-worker listens for task-exception to get the cancel message, so in most cases it'll upload immediately after cancelTask. But in current setup even that will fail. Note: We should NOT allow artifacts to be uploaded after reportCompleted or reportFailed, in these cases we want to ensure that artifacts are present at the time of resolution. As logs aren't a best-effort service here.
Comment 1•9 years ago
|
||
The buildbot bridge will throw exceptions like this until this is fixed: Traceback (most recent call last): File "/builds/bbb/bin/buildbot-bridge", line 9, in <module> load_entry_point('bbb==0.3', 'console_scripts', 'buildbot-bridge')() File "/builds/bbb/lib/python2.7/site-packages/bbb-0.3-py2.7.egg/bbb/runner.py", line 81, in main service.start() File "/builds/bbb/lib/python2.7/site-packages/bbb-0.3-py2.7.egg/bbb/servicebase.py", line 300, in start connection.drain_events() File "/builds/bbb/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events return self.transport.drain_events(self.connection, **kwargs) File "/builds/bbb/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 91, in drain_events return connection.drain_events(**kwargs) File "/builds/bbb/lib/python2.7/site-packages/amqp/connection.py", line 325, in drain_events return amqp_method(channel, args, content) File "/builds/bbb/lib/python2.7/site-packages/amqp/channel.py", line 1908, in _basic_deliver fun(msg) File "/builds/bbb/lib/python2.7/site-packages/kombu/messaging.py", line 592, in _receive_callback return on_m(message) if on_m else self.receive(decoded, message) File "/builds/bbb/lib/python2.7/site-packages/kombu/messaging.py", line 559, in receive [callback(body, message) for callback in callbacks] File "/builds/bbb/lib/python2.7/site-packages/bbb-0.3-py2.7.egg/bbb/services.py", line 118, in handleFinished createJsonArtifact(self.tc_queue, taskid, runid, "public/properties.json", properties, expires) File "/builds/bbb/lib/python2.7/site-packages/bbb-0.3-py2.7.egg/bbb/tcutils.py", line 20, in createJsonArtifact "expires": expires, File "/builds/bbb/lib/python2.7/site-packages/taskcluster/client.py", line 455, in apiCall return self._makeApiCall(e, *args, **kwargs) File "/builds/bbb/lib/python2.7/site-packages/taskcluster/client.py", line 232, in _makeApiCall return self._makeHttpRequest(entry['method'], route, payload) File "/builds/bbb/lib/python2.7/site-packages/taskcluster/client.py", line 424, in _makeHttpRequest superExc=rerr taskcluster.exceptions.TaskclusterRestFailure: The given run is not running Not a huge rush to fix, very nice to have though.
Assignee | ||
Comment 2•9 years ago
|
||
Should be a quick fix... I have a lot of reviewed stuff to rollout, also for queue, but will get back to this soon.
Severity: normal → major
Priority: -- → P2
Assignee | ||
Comment 3•9 years ago
|
||
Ideas for a better code path are welcome... But tests seems to pass. Add any comments suggestions in github PR, thanks. Feel free to push the merge button, but this still requires manual push, I'll undertake that when this is r+'ed.
Updated•9 years ago
|
Attachment #8610895 -
Flags: review?(garndt) → review+
Assignee | ||
Comment 4•9 years ago
|
||
Deployed, enjoy.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Component: TaskCluster → Queue
Product: Testing → Taskcluster
Updated•5 years ago
|
Component: Queue → Services
You need to log in
before you can comment on or make changes to this bug.
Description
•