Testdaybot is not active on #qa (seems to go down frequently)

RESOLVED WONTFIX

Status

Infrastructure & Operations
WebOps: IT-Managed Tools
RESOLVED WONTFIX
2 years ago
a year ago

People

(Reporter: FlorinMezei, Assigned: atoll)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4092])

(Reporter)

Description

2 years ago
The TestDayBot seems to not be active again on IRC #qa. We use it to run regular testdays, about every two weeks. Over the past couple of months it has been mostly down. Henrik Skupin helped restart it, but we could probably use some investigation into what is causing it to go down and not recover.

I think the TestDayBot is "node bot.js".

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3343]
Yeah, a process (maybe dead) was still around so I killed it all the last times. Given that this is a service, the bot gets restarted automatically.
Sounds WORKSFORME, but if it isn't, please feel free to reopen and move it to a QA-related component.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
This is not WFM. I was talking about the offline times from the past but not from the current outage. Something is definitely wrong with this installed service.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Here more details... The bot is running on mozqa.com (mozqa1.qa.scl3) and was initially setup by IT via bug 918266.
I force killed the process to allow QA to run its testday today:

[hskupin@mozqa1.qa.scl3 ~]$ ps -ef | grep bot
root     17108  2122  0 Aug11 ?        00:00:07 node bot.js https://etherpad.mozilla.org/testday-20130927
hskupin  24944 24747  0 00:16 pts/0    00:00:00 grep bot
[hskupin@mozqa1.qa.scl3 ~]$ sudo kill -9 17108
[hskupin@mozqa1.qa.scl3 ~]$ ps -ef | grep bot
root     24953  2122  5 00:16 ?        00:00:00 node bot.js https://etherpad.mozilla.org/testday-20130927
hskupin  24960 24747  0 00:16 pts/0    00:00:00 grep bot

The bot is back in QA now.
Oh, I found the docs now:
https://mana.mozilla.org/wiki/display/websites/mozqa.com#mozqa.com-IRCBot

So next time when it happens we should check /tmp/testdaybot-run.log for failures before restarting the bot.

Florin, please let us know via this bug when the bot is gone again. Thanks.
(Reporter)

Comment 7

2 years ago
Bug is gone again guys.
Flags: needinfo?(rsoderberg)
Flags: needinfo?(hskupin)
(Reporter)

Comment 8

2 years ago
I mean "the testdaybot is gone again"
The last time the log has been written was 7:45am PDT today:
-rw-r--r-- 1 root root 479 Sep  1 07:45 /tmp/testdaybot-run.log

The log actually doesn't contain anything useful to act on:

+ DATA_DIR=/usr/local/testdaybot
+ cd /usr/local/testdaybot
+ git fetch -p origin
+ git reset --hard
HEAD is now at eaecd7e Minor change to readme for help command
++ git rev-parse @
+ LOCAL=eaecd7e3bac0bc0a3ac78cfbd75bebc86972adb5
++ git rev-parse '@{u}'
+ REMOTE=eaecd7e3bac0bc0a3ac78cfbd75bebc86972adb5
++ git merge-base @ '@{u}'
+ BASE=eaecd7e3bac0bc0a3ac78cfbd75bebc86972adb5
+ '[' eaecd7e3bac0bc0a3ac78cfbd75bebc86972adb5 = eaecd7e3bac0bc0a3ac78cfbd75bebc86972adb5 ']'
+ :

Florin, I assume the bot is needed for tomorrow? Maybe Richard can find some minutes to have a look. If not I will force restart it tomorrow again.
Flags: needinfo?(hskupin)
(Reporter)

Comment 10

2 years ago
We'll need the testdaybot for next week, I just noticed it was down again today.
Emailing webops to try and find someone with knowledge here.
Flags: needinfo?(rsoderberg)
(Reporter)

Comment 12

2 years ago
Any progress on this? It's been one week with the TestDayBot still down. We'd need it tomorrow for our Aurora 50 Testday.
Henrik/Florin,

Sorry, but I'm not really sure where this bot runs from and if we maintain that server. Do one of you know where this runs from? So I can either assign someone to look into this or move this bug to the right team/component. 

Thanks!
Flags: needinfo?(hskupin)
Flags: needinfo?(florin.mezei)
Shyam, please comment 5 for details.
Flags: needinfo?(hskupin)
(Reporter)

Updated

2 years ago
Flags: needinfo?(florin.mezei)

Updated

2 years ago
Assignee: server-ops-webops → rsoderberg
Henrik,

I'm not really sure this is our service to help maintain, we don't know anything about it. You can probably write a simple cron to kick the bot when it goes off IRC to get it to auto connect.

Sorry!
Status: REOPENED → RESOLVED
Last Resolved: 2 years agoa year ago
Resolution: --- → WONTFIX

Updated

a year ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3343] → [kanban:https://webops.kanbanize.com/ctrl_board/2/3936]

Updated

a year ago
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED

Updated

a year ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3936] → [kanban:https://webops.kanbanize.com/ctrl_board/2/4020]

Updated

a year ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED

Updated

a year ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4020] → [kanban:https://webops.kanbanize.com/ctrl_board/2/4092]

Updated

a year ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
My apologies for all the spam. Sometimes, automation can be a pain, this is clearly one of those times. We'll try and fix this before we try and go through this process again.
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED
Well, and as you said earlier this is a wontfix. Nothing has been fixed by IT here. :)
Resolution: FIXED → WONTFIX
You need to log in before you can comment on or make changes to this bug.