Closed
Bug 620596
Opened 14 years ago
Closed 7 months ago
Need some form of queue for posting of results to graphs server
Categories
(Webtools Graveyard :: Graph Server, defect)
Webtools Graveyard
Graph Server
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: fox2mike, Unassigned)
Details
If talos machines are unable to reach the graphs DB, tests fail (as of now) and the tree starts showing up red.
This should be modified to a queue system which will change the colour to notify of possible issues with uploading to graphs and re-try every x minutes over y hours before failing and making it go red.
Opinions? Thoughts? I'm CC'ing zandr since he's going to have a talk to joduinn about this in person.
Reporter | ||
Comment 1•14 years ago
|
||
Also, I understand that this might not be the desired behaviour, but would like some discussion before we decide one way or another.
Comment 2•14 years ago
|
||
I think this a Graph Server bug, not necessarily RelEng one.
Component: Release Engineering → Graph Server
Product: mozilla.org → Webtools
QA Contact: release → graph.server
Reporter | ||
Comment 3•14 years ago
|
||
Not really.
Who handles the part where the talos machines write to the graph server? RelEng I bet :)
19:24:14 < fox2mike> bhearsum: who handles the code that makes the talos machines contact the graphs server?
19:24:23 < bhearsum> releng
19:24:27 < bhearsum> it should be a server side queue, though
Component: Graph Server → Release Engineering
Product: Webtools → mozilla.org
QA Contact: graph.server → release
Comment 4•14 years ago
|
||
Fine, this can stay here. I still don't believe that such a queue should be disassociated with the server, though.
Reporter | ||
Comment 5•14 years ago
|
||
If the graphs server team handles talos code, I'd be happy to pass this to them :) I'm not the one decide where release engg related bug go, so I'll defer to you guys on that :D
Comment 6•14 years ago
|
||
We're bikeshedding about the wrong thing here.
1) graphserver as SPOF is not new. I agree that some form of redundant collector would be good, but only if we can do it without turning graphserver into Son of Socorro.
2) Talos machines don't have any long-term persistence. They reboot and clobber with great frequency. So if the results aren't posted, they're gone. As such, this is desired behavior in the current world. There is a whole different discussion about distinguishing between failed tests and failed testers, but that's Not Trivial.
2) The recent breakage was caused by graphserver posting to the AMO db, and apparently doing that synchronously with the post from the slave. That is insane, unacceptable, and the response to https://bugzilla.mozilla.org/show_bug.cgi?id=620570#c10 is where that conversation will take place.
Comment 7•14 years ago
|
||
I think that the most basic solution here is a message queue, with the Talos results being producers, and the graph server as a consumer. Might need some adjustment to Talos/unittests if graph server sends back any data.
Updated•14 years ago
|
Component: Release Engineering → Talos
Product: mozilla.org → Testing
QA Contact: release → talos
Version: other → Trunk
Updated•12 years ago
|
Component: Talos → Webdev
Product: Testing → mozilla.org
Version: Trunk → other
Comment 8•12 years ago
|
||
So this is a graphserver issue. The queue needs to be graphserver side, not Talos side (potentially, it could live elsewhere in infrastructure as well, but I'm guessing graphserver makes the most sense). That said, graphserver is going to be replaced with datazilla, which already has such a queuing system in place
Updated•12 years ago
|
Component: Webdev → Graph Server
Product: mozilla.org → Webtools
Assignee | ||
Updated•8 years ago
|
Product: Webtools → Webtools Graveyard
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•