Bug 1538202 Comment 15 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

We haven't seen another issue in 3 days.

Couple of interesting things about this issue:

1. It never affected publishing crash ids. That code path does synchronous publishing that doesn't use threads. It does that so as to guarantee that it either published the crash id or gets an error which causes it to toss the crash id in a queue to try again later. It's easier to prove correctness if that whole operation is happening "here and now" rather than "at some point, maybe, in an execution context far far away".

2. It only affected the /__heartbeat__ endpoint which calls publisher.get_topic() to make sure the topic exists and Antenna has access to it. That code path uses threads in grpcio.

So one of the things I did after landing the last fix was to hammer the /__heartbeat__ endpoint to try to trigger the issue rather than wait for the stars to align and it to get triggered on its own. I wasn't able to trigger it. That's not proof it's fixed, but it increased my confidence.

Anyhow, no instances in 3 days. I think we're good now. I'm going to mark this as FIXED.
We haven't seen another issue in 3 days.

Couple of interesting things about this issue:

1. It never affected publishing crash ids. That code path does synchronous publishing that doesn't use threads. It does that so as to guarantee that it either published the crash id or gets an error which causes it to toss the crash id in a queue to try again later. It's easier to prove correctness if that whole operation is happening "here and now" rather than "at some point, maybe, in an execution context far far away".

2. It only affected the `/__heartbeat__` endpoint which calls publisher.get_topic() to make sure the topic exists and Antenna has access to it. That code path uses threads in grpcio.

So one of the things I did after landing the last fix was to hammer the `/__heartbeat__` endpoint to try to trigger the issue rather than wait for the stars to align and it to get triggered on its own. I wasn't able to trigger it. That's not proof it's fixed, but it increased my confidence.

Anyhow, no instances in 3 days. I think we're good now. I'm going to mark this as FIXED.

Back to Bug 1538202 Comment 15