Closed
Bug 1120919
Opened 10 years ago
Closed 10 years ago
[pulsetranslator] Do not skip messages with routing keys ending with .finished instead of .log_uploaded
Categories
(Webtools :: Pulse, defect)
Webtools
Pulse
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: whimboo, Assigned: whimboo)
References
Details
(Whiteboard: [qa-automation-blocked])
Attachments
(1 file)
1.80 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
I'm currently setting up a script for bug 735184 which listens for all build messages coming in. It filters out specific messages for update channel activities. While doing that I noticed that not all tasks have a .log_uploaded message! Especially those which are about updates and do not upload a log file. What I can say is that we always have a .finished message.
Here the example for update messages:
> -rw-rw-r-- 1 henrik henrik 12694 Jan 13 12:15 2015-01-13T11:15:24Z-build.release-mozilla-beta-firefox_updates.0.started
> -rw-rw-r-- 1 henrik henrik 17178 Jan 13 12:45 2015-01-13T11:45:13Z-build.release-mozilla-beta-firefox_updates.0.finished
Currently pulsetranslator doesn't see those messages at all, so that they are not available on the normalized exchange.
Mark, what do you think? Is that something we could easily switch?
Flags: needinfo?(mcote)
Assignee | ||
Comment 1•10 years ago
|
||
While further checking the order of .log_uploaded and .finished, I see a strange ordering problem. While in most of the cases the .finished message really comes last, in some cases it's before .log_uploaded.
.log_uploaded before .finished:
> -rw-rw-r-- 1 henrik henrik 6236 Jan 13 13:13 2015-01-13T12:13:21Z-build.release-mozilla-beta-win32_partner_repack.19.log_uploaded
> -rw-rw-r-- 1 henrik henrik 6226 Jan 13 13:13 2015-01-13T12:13:26Z-build.release-mozilla-beta-win32_partner_repack.19.finished
finished before .log_uploaded:
-rw-rw-r-- 1 henrik henrik 1582 Jan 13 12:15 2015-01-13T11:15:21Z-build.release-mozilla-beta-win32_repack_complete.1.finished
-rw-rw-r-- 1 henrik henrik 1904 Jan 13 12:15 2015-01-13T11:15:37Z-build.release-mozilla-beta-win32_repack_complete.1.log_uploaded
Chris, could that be a problem with the builders and some kind of asynchronous tasks?
Flags: needinfo?(catlee)
Assignee | ||
Comment 2•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #0)
> I'm currently setting up a script for bug 735184 which listens for all build
> messages coming in. It filters out specific messages for update channel
> activities. While doing that I noticed that not all tasks have a
> .log_uploaded message! Especially those which are about updates and do not
Looks like I was wrong here. We do have updates and also upload them. The specific message was kinda hidden between all those notifications. So here the correct list:
> > -rw-rw-r-- 1 henrik henrik 12694 Jan 13 12:15 2015-01-13T11:15:24Z-build.release-mozilla-beta-firefox_updates.0.started
> > -rw-rw-r-- 1 henrik henrik 17178 Jan 13 12:45 2015-01-13T11:45:13Z-build.release-mozilla-beta-firefox_updates.0.finished
> -rw-rw-r-- 1 henrik henrik 16525 Jan 13 12:45 2015-01-13T11:45:47Z-build.release-mozilla-beta-firefox_updates.0.log_uploaded
Here we have the .finished message coming in 30s before the .log_uploaded.
Comment 3•10 years ago
|
||
We don't want to do this categorically, although we may want to add support for certain message types.
The reason we use .log_uploaded at present is that many consumers are interested in the logs (this is especially true for listeners that watch for unittest messages), and the logs are not guaranteed to be available when the .finished message is published. Switching from .log_uploaded to .finished would force those consumers to poll for logs, which is undesirable both from a design perspective and from a network perspective.
What we really want to do, instead, is figure out which message types that we care about that don't have .log_uploaded messages and use .finish for those (we already do this for pulse messages for builds). Alternately, we could support both and allow the consumer to specify which to listen for.
Assignee | ||
Comment 4•10 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #3)
> The reason we use .log_uploaded at present is that many consumers are
> interested in the logs (this is especially true for listeners that watch for
> unittest messages), and the logs are not guaranteed to be available when the
> .finished message is published. Switching from .log_uploaded to .finished
> would force those consumers to poll for logs, which is undesirable both from
> a design perspective and from a network perspective.
Sounds reasonable in case of those two messages are send in a non-defined order. I thought so far that .finish means that really all necessary tasks have been finished. Also including uploading the logs. But it looks like it is not the case.
> What we really want to do, instead, is figure out which message types that
> we care about that don't have .log_uploaded messages and use .finish for
> those (we already do this for pulse messages for builds).
For real builds we do not use .finished yet. We early return in the pulse callback:
https://github.com/mozilla/pulsetranslator/blob/master/pulsetranslator/pulsetranslator.py#L375
So what does .finished exactly mean? Is this message send when the builds have been uploaded?
> Alternately, we could support both and allow the consumer to specify which to listen for.
I like that idea even more! If that would work, I can do this change.
Comment 5•10 years ago
|
||
catlee can correct me if I'm wrong, but I believe .finished means that buildbot is done. However, log uploading is done somewhat asynchronously, so .log_uploaded can happen before or after .finished.
> > Alternately, we could support both and allow the consumer to specify which to listen for.
>
> I like that idea even more! If that would work, I can do this change.
Sure, I think that's fine as long as we default to the current behavior.
Comment 6•10 years ago
|
||
I don't think there's much for me to add here; I support including both .finished and .log_uploaded.
Flags: needinfo?(mcote)
Comment 7•10 years ago
|
||
Jonathan is right, .log_uploaded is submitted after .finished.
Flags: needinfo?(catlee)
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #7)
> Jonathan is right, .log_uploaded is submitted after .finished.
Ok, but what about those inconsistencies then? There are lots of tasks which show up the opposite order. Means .log_uploaded first, and afterward .finished. Here just an example again beside the repack one from comment 1:
-rw-rw-r-- 1 mozauto mozauto 5459 Jan 13 06:00 2015-01-13T14:00:17Z-build.release-mozilla-beta-macosx64_update_verify_2.21.log_uploaded
-rw-rw-r-- 1 mozauto mozauto 5314 Jan 13 06:00 2015-01-13T14:00:25Z-build.release-mozilla-beta-macosx64_update_verify_2.21.finished
In case of builds being uploaded, will .log_uploaded or .finished be safer?
Flags: needinfo?(catlee)
Comment 9•10 years ago
|
||
rabbitmq doesn't guarantee message delivery order; also there will be times when sending message (1) fails for various reasons, and so it gets put into a queue to be re-submitted. In the meanwhile message (2) is submitted and succeeds right away. The timestamps inside the messages should indicate the correct order.
If you see .finished, that means that the job itself is done on buildbot. When you see .log_uploaded, it means that the log for that job has been processed and uploaded successfully. Obviously the job needs to finish before the log is processed and uploaded.
Flags: needinfo?(catlee)
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Summary: [pulsetranslator] Consumers should listen for routing keys ending with .finished instead of .log_uploaded → [pulsetranslator] Do not skip messages with routing keys ending with .finished instead of .log_uploaded
Assignee | ||
Comment 10•10 years ago
|
||
Mark or Jonathan, can one of you please review? Not sure if we also want to take it for unittests, but I think it would be a good idea.
Consumers who do not correctly check the type of the message might run jobs twice with that change pushed to production. Would we have to inform someone?
Attachment #8563035 -
Flags: review?(mcote)
Attachment #8563035 -
Flags: review?(jgriffin)
Comment 11•10 years ago
|
||
Comment on attachment 8563035 [details] [diff] [review]
patch v1
Review of attachment 8563035 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm
Attachment #8563035 -
Flags: review?(mcote) → review+
Assignee | ||
Comment 12•10 years ago
|
||
Comment on attachment 8563035 [details] [diff] [review]
patch v1
Thanks Mark. I assume it is enough to get r+ from you here. So removing jgriffin.
Attachment #8563035 -
Flags: review?(jgriffin)
Assignee | ||
Comment 13•10 years ago
|
||
Landed as:
https://github.com/mozilla/pulsetranslator/commit/bd25f2091fef7d5618bd2bb52b57464bb807fb1e
Mark, would you mind to publish this changes to production? Thanks.
Flags: needinfo?(mcote)
Comment 14•10 years ago
|
||
Done!
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(mcote)
Resolution: --- → FIXED
Comment 15•10 years ago
|
||
Are we sure this is safe? It looks like it might result in duplicate unittest messages being delivered, once for .log_uploaded, and once for .finished.
Comment 16•10 years ago
|
||
Ah I was presuming we preserve that part of the routing key... apparently I don't know enough about pulsetranslator to be reviewing code. :) Should I back this out?
Comment 17•10 years ago
|
||
I'll run a test to see if my guess is correct, first.
Assignee | ||
Comment 18•10 years ago
|
||
Jonathan might be right, at least in terms of clients which do not look out for the suffix. This would imply pulsebuildmonitor:
http://hg.mozilla.org/automation/pulsebuildmonitor/file/c392765232c6/pulsebuildmonitor/pulsebuildmonitor.py#l24
Not sure what the best action is here. Trying to update all clients first and wait until all consumers have updated clients, and push this change then, or see which consumers are misbehaving. Maybe we should at least get pulsebuildmonitor fixed first?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 19•10 years ago
|
||
I think the right solution is to make pulsebuildmonitor handle repacks a little differently, since it seems to be those that are at issue here, and not the unittest jobs in general, correct?
Assignee | ||
Comment 20•10 years ago
|
||
I cannot say for sure which specific routing keys would be affected here. Given by Ben and Chris all tasks should have a .finished and .log_uploaded message. It's just not sure in which order those appear.
Maybe we should get it backed-out and leave it by only processing .log_uploaded messages? If we are really in trouble we could do special casing as you suggest.
If you wanna do a backout please do it yourself. I will have to head out in a bit for the next two days.
Comment 21•10 years ago
|
||
I did back this out, it was causing a pileup in the pulsetranslator, because it was waiting for logs to become available (due to a .finished message) when they weren't. Because pulsetranslator was changed to be single-threaded, this basically caused us to stop processing messages.
Assignee | ||
Comment 22•10 years ago
|
||
Given that problem and that we do not know how consumers would react lets wontfix this bug for now. If there is really a need later we could reopen it.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•