Closed Bug 1118261 Opened 9 years ago Closed 5 years ago

Please add (sequential) ID to every pulse message

Categories

(Webtools :: Pulse, defect, P3)

x86_64
Windows 8.1
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ekyle, Unassigned)

Details

It would be nice to have Pulse include a tracking ID with every message.  This will help with auditing and debugging any data processing chains.  Furthermore, if Pulse' producers are emitting a similar tracking ID, include that too.

Pulse' _meta data currently looks something like:

    {
    "serializer": "json", 
    "routing_key": "unittest.try.android-api-11.blah.blah...", 
    "sent": "2015-01-06T06:38:34-08:00", 
    "exchange": "exchange/build/normalized"
    }

Please include an "ID" attribute for the purposes of tracking.  It should be:

  * dense - The closer the IDs are together numerically,(or 
    alphanumerically) the easier it is to determine if the data is 
    missing.  The ID's need not be contiguous; after all, most listeners
    only get a small number of the total messages handled by pulse.  
  * ordered - Monotonically increasing IDs help by relating ID ranges to
    date ranges: which can reduce the search space when looking for missing
    data.  Along with dense numbering, auditing correctness can be done
    with relatively simple algorithms.

I suggest adding a simple, persistent, counter to every queue.
First, I assume by "to every queue" you actually mean "to every exchange", meaning if two consumers are listening to the same exchange, they'll see the same IDs.

Second, what are you trying to achieve here exactly?  I agree that we should be trying to ensure that messages are always delivered properly, but this doesn't seem like a great way to go about it, in part because, as you say, most consumers don't listen to all messages from a given exchange, and because there's no way to determine what the data was if you did in fact miss a message.
Yes, when I mention "queue" I meant "exchange".

My particular problem was trying to compare to the messages received from two Pulse consumers, both listening on the same exchange, same topic.  As my programs tend to do; one failed and lost it's messages.  I needed to perform a diff to find what I lost.  Of the unittests, I can assume every message was unique, so some hash worked fine, but in general I do not see how Pulse messages are necessarily unique.  Also, I was lucky to have only lost about 10K messages, so they fit in memory and I could perform a set-wise subtraction on their hash keys with a simple program.

Considering other problems I have experienced in the past: 

  * All loggers are dropping messages - The dense IDs will help indicate 
    potential losses because we can expect a density independent of time.  
    Using just timestamp to try and detect losses gets complicated, or 
    impossible on the edge of weekends and holidays.
  * Dropping messages silently over a long period of time - a more in-depth
    analysis over more than a couple of hours will require more code and 
    libraries to manage memory to perform the same analysis.

Furthermore, if/when all pulse messages are recorded to long-term storage they can be used to verify long-term correctness and stability of other consumers.
Here is an example of a missing Pulse record [1].  With dense increasing IDs, it would be easy to confirm which side of the queue the messages are lost.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1168349
We probably can't enforce this.  We could add something to the mozillapulse Publisher code, but currently that doesn't have any persistent data, so I'm not sure that's a good idea to make it mandatory.  We could add it to the package, though, and require individual publisher shims to configure and activate it.  We can also add to the spec as a "SHOULD" item.
Priority: -- → P3

This would have to be done in the producer, and as mark suggests in comment 4, would be hard (and impossible to get high density) with distributed producers.

I think this should be filed as an expectation of things that publish to pulse (many of which may call it WONTFIX on the above basis).

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.