There have been many problems with pulsetranslator lately, and we don't have good insight into what's going on. Given several services rely on the normalized-build stream, we need to monitor it until we're sure it's running well. A basic test app would listen to both the "raw" and the normalized exchanges and record basic information from messages in order to correlate them. The goal is to (a) ensure each raw message is translated in a timely manner and (b) there are no duplicate messages (both of which were mentioned in bug 1094272). This is a bit complicated given that not every message is normalized and that there's no reference to the original message. The latter is easy to fix; we should preserve the original name in a new property. I'll file a bug for that. The former will be a bit trickier and require going to the pulsetranslator source to figure out what should not be tracked. If a certain time elapses without seeing the associated normalized message, or if a duplicate is seen, we should log a message and ideally send out an email.
I'll start looking into this one.
Assignee: nobody → mcc.ricardo
Some questions asked, by me to Mark outside Bugzilla, that are relevant to the bug: > The basic idea is that we would create an app/service that would monitor pulse, > decide which messages should be translated and check if pulsetranslator is doing > the translation correctly and in a timely manner. Yeah, a consumer for each stream, "raw" and normalized, and some logic to compare them. > Should this be more of a background service or some sort of app > with possibly some web interface of some kind? I think a background service, with logs, is all we need here. > Should the code we develop for this be included in the > pulsetranslator code base or should this be a separate project? Separate, for now at least. I don't foresee this as something that will be run forever; we can just use it on demand to debug problems.
I believe we can take 2 approaches to this: 1) Create a script that is run by a cron job. 2) Create a script that will run a loop, and that script will be run and monitored by supervisord.
It will have to run continuously, since it launches two consumers that remain connected and consume messages in order to compare them. So option 2), but it'll have to be multithreaded or multiprocess, probably, and the looping will be implicit in the consumer listen() calls.
Maybe Celery (http://www.celeryproject.org/) can lend us a hand on this.
You could try that, although personally I would probably use stuff from the standard library, e.g. multiprocessing. Check out mozillapulse/test/runtests.py where we have a publisher and a consumer running in separate processes and communicating via a queue.
Precisely! I was looking through that code you mentioned, this morning before coming into work. I believe that to be a better option
I'll use this repository: https://github.com/mccricardo/pulsetranslator_monitor
In Pulse, there are several consumers for different exchanges. I believe we should monitor all those exchanges.
The only consumers that are relevant for this bug are the BuildConsumer and the NormalizedBuildConsumer.
You need to log in before you can comment on or make changes to this bug.