Closed Bug 1152294 Opened 9 years ago Closed 9 years ago

Bugzilla Elasticsearch Cluster is Stale (has ETL has stopped?)

Categories

(bugzilla.mozilla.org :: Infrastructure, defect)

Production
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ekyle, Assigned: fubar)

References

(Blocks 1 open bug)

Details

The public and private clusters have stopped receiving updates from the ETL processes.  The last update was done around Mar 29 @ 14:46EDT.

Please check that the ETL processes, both public and private, are running. I may require logs if they are.

[1] Diagram with bug numbers:  https://github.com/klahnakoski/Bugzilla-ETL/blob/master/docs/Architecture%20%28bug%20879822%29.png
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/194]
Flags: needinfo?(klibby)
There was a hung private process from March 29; killed it and the next cron job ran and did stuff. The public one looks like it's been running fine, from the logs.
Flags: needinfo?(klibby)
And now it looks like the private one may be wedged again?  It's still running, and the logs are showing:

2015-04-08 13:24:49.233966 - Waiting on thread "etl"

2015-04-08 13:24:56.555011 - Waiting on thread "etl"

2015-04-08 13:25:10.284315 - Waiting on thread "etl"

strace doesn't help much:
[pid 23430] futex(0x3113020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23482] futex(0x3113020, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23430] <... futex resumed> )       = 0
[pid 23430] futex(0x3113020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23482] futex(0x3113020, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23430] <... futex resumed> )       = 0
[pid 23482] futex(0x3113020, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 23430] futex(0x3113020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23482] <... futex resumed> )       = 0
[pid 23430] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 23430] futex(0x3113020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23482] futex(0x3113020, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23430] <... futex resumed> )       = 0
[pid 23430] futex(0x3113020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23482] futex(0x3113020, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 23430] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 23482] <... futex resumed> )       = 0
The ETL should be generating a log file (exact location in setting.json file)  which may give more infor to what's happening (problem connecting?)
The "Waiting on thread \"etl\"" without logs between means the etl thread is slow (or failed).   Again the log file should tell us more.
The ETL appears to have caught up fine.  This bug will be kept open until fubar and I are happy it will not show another problem.
Looks good!
Assignee: nobody → klibby
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 1156329
You need to log in before you can comment on or make changes to this bug.