The message counters do not appear to be advancing on the CEP nodes, although the DWL node (the important one!) seems to be advancing as expected.
pipeline-cep2 has had hekad-cep restarted, and it is now throwing the following error message: > Error making runner for TelemetryKafkaInput21: Initialization failed for 'TelemetryKafkaInput21': kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition. TelemetryKafkaInput21 varies widely between messages, and hekad-cep fails to start after about 4 seconds, causing systemd to indefinitely restart it. whd has been paged about this, because I don't know where to go from here.
This should be fixed now. The way I fixed this last week (and just now) is by switching the offset method from Manual to Newest on the CEP nodes, which is probably acceptable for monitoring hosts. I need to investigate why the CEP nodes occasionally fall behind, but that will be a separate bug. These nodes are pretty much idle, so it's not a resource issue, and the DWL nodes (which use zookeeper-backed offsets) don't have the issue either. Somewhat related, bug #1208570 appears to still be problematic. I will update that bug later.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.