Closed Bug 922339 Opened 11 years ago Closed 11 years ago

builds-running.js and builds-pending.js frequently getting 503 errors

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 922275

People

(Reporter: KWierso, Unassigned)

References

Details

***** Nagios  *****

Notification Type: PROBLEM

Service: http file age - /buildjson/builds-4hr.js.gz
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: CRITICAL

Date/Time: 09-30-2013 14:02:04

Additional Info:
HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:11:32 ago - 1439848 bytes in 3.676 second response time

  

http://m.allizom.org/http%2Bfile%2Bage%2B-%2B/buildjson/builds-4hr.js.gz
This is actually probably the same thing as bug 922275...
Email from: root@buildapi01.build.scl1.mozilla.com
Time: 4:56 PM
Subject: "Cron <root@buildapi01> sudo -u buildapi /usr/local/bin/update_hg_wc.sh /home/buildapi/src && /etc/init.d/buildapi restart
abort: error: Temporary failure in name resolution
0 files updated, 0 files merged, 0 files removed, 0 files unresolved


Email from: root@buildapi01.build.scl1.mozilla.com
Time: 5:16 PM
Subject: Cron <buildapi@buildapi01> /home/buildapi/bin/report-today.sh
Traceback (most recent call last):
  File "/home/buildapi/src/buildapi/scripts/reporter.py", line 356, in <module>
    report = build_report(R, session, scheduler_db_engine, starttime, endtime)
  File "/home/buildapi/src/buildapi/scripts/reporter.py", line 124, in build_report
    masters[build.master_id] = {'name': build.master.name,
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/orm/attributes.py", line 168, in __get__
    return self.impl.get(instance_state(instance),dict_)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/orm/attributes.py", line 420, in get
    value = self.callable_(state, passive)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/orm/strategies.py", line 526, in _load_for_state
    return q._load_on_ident(ident_key)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/orm/query.py", line 2071, in _load_on_ident
    return q.one()
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/orm/query.py", line 1744, in one
    ret = list(self)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/orm/query.py", line 1787, in __iter__
    return self._execute_and_instances(context)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/orm/query.py", line 1802, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1358, in execute
    params)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1491, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1599, in _execute_context
    context)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1592, in _execute_context
    context)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/default.py", line 325, in do_execute
    cursor.execute(statement, parameters)
  File "/home/buildapi/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/home/buildapi/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
sqlalchemy.exc.OperationalError: (OperationalError) (2013, 'Lost connection to MySQL server during query') 'SELECT masters.id AS masters_id, masters.url AS masters_url, masters.name AS masters_name \nFROM masters \nWHERE masters.id = %s' (91L,)
16:57 nagios-releng: Mon 13:57:34 PDT [4037] buildapi01.build.scl1.mozilla.com:Ganglia IO is CRITICAL: NRPE: Command check_ganglia not defined (http://m.allizom.org/Ganglia+IO)
17:02 nagios-releng: Mon 14:02:05 PDT [4038] builddata.pub.build.mozilla.org:http file age - /buildjson/builds-4hr.js.gz is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:11:32 ago - 1439848 bytes in 3.676 second response time (http://m.allizom.org/http+file+age+-+/buildjson/builds-4hr.js.gz)
17:07 nagios-releng: Mon 14:07:34 PDT [4040] buildapi01.build.scl1.mozilla.com:Ganglia IO is OK: CHECKGANGLIA OK: cpu_wio is 0.40 (http://m.allizom.org/Ganglia+IO)
17:08 nagios-releng: Mon 14:08:34 PDT [4041] buildapi01.build.scl1.mozilla.com:procs - buildapi is CRITICAL: PROCS CRITICAL: 0 processes with regex args paster.*/home/buildapi/production.ini (http://m.allizom.org/procs+-+buildapi)
17:12 nagios-releng: Mon 14:12:03 PDT [4042] builddata.pub.build.mozilla.org:http file age - /buildjson/builds-4hr.js.gz is OK: HTTP OK: HTTP/1.1 200 OK - 1477825 bytes in 0.086 second response time (http://m.allizom.org/http+file+age+-+/buildjson/builds-4hr.js.gz)
17:13 nagios-releng: Mon 14:13:33 PDT [4043] buildapi01.build.scl1.mozilla.com:procs - buildapi is OK: PROCS OK: 1 process with regex args paster.*/home/buildapi/production.ini (http://m.allizom.org/procs+-+buildapi)
From buildapi01@/home/buildapi/buildapi.log

I've also seen the OpertaionalError on reporter-4hr.log.

Traceback (most recent call last):
  File "/home/buildapi/bin/paster", line 9, in <module>
    load_entry_point('PasteScript==1.7.3', 'console_scripts', 'paster')()
  File "/home/buildapi/lib/python2.6/site-packages/paste/script/command.py", line 84, in run
    invoke(command, command_name, options, args[1:])
  File "/home/buildapi/lib/python2.6/site-packages/paste/script/command.py", line 123, in invoke
    exit_code = runner.run(args)
  File "/home/buildapi/lib/python2.6/site-packages/paste/script/command.py", line 218, in run
    result = self.command()
  File "/home/buildapi/lib/python2.6/site-packages/paste/script/serve.py", line 276, in command
    relative_to=base, global_conf=vars)
  File "/home/buildapi/lib/python2.6/site-packages/paste/script/serve.py", line 313, in loadapp
    **kw)
  File "/home/buildapi/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 247, in loadapp
    return loadobj(APP, uri, name=name, **kw)
  File "/home/buildapi/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 272, in loadobj
    return context.create()
  File "/home/buildapi/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 710, in create
    return self.object_type.invoke(self)
  File "/home/buildapi/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 229, in invoke
    filtered = context.next_context.create()
  File "/home/buildapi/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 710, in create
    return self.object_type.invoke(self)
  File "/home/buildapi/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 146, in invoke
    return fix_call(context.object, context.global_conf, **context.local_conf)
  File "/home/buildapi/lib/python2.6/site-packages/paste/deploy/util.py", line 56, in fix_call
    val = callable(*args, **kw)
  File "/home/buildapi/src/buildapi/config/middleware.py", line 55, in make_app
    config = load_environment(global_conf, app_conf)
  File "/home/buildapi/src/buildapi/config/environment.py", line 52, in load_environment
    init_scheduler_model(scheduler_engine)
  File "/home/buildapi/src/buildapi/model/__init__.py", line 7, in init_scheduler_model
    scheduler_db_meta.reflect(bind=engine)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/schema.py", line 2342, in reflect
    conn = bind.contextual_connect()
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 2284, in contextual_connect
    self.pool.connect(), 
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/pool.py", line 209, in connect
    return _ConnectionFairy(self).checkout()
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/pool.py", line 370, in __init__
    rec = self._connection_record = pool._do_get()
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/pool.py", line 696, in _do_get
    con = self._create_connection()
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/pool.py", line 174, in _create_connection
    return _ConnectionRecord(self)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/pool.py", line 255, in __init__
    self.connection = self.__connect()
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/pool.py", line 315, in __connect
    connection = self.__pool._creator()
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/buildapi/lib/python2.6/site-packages/sqlalchemy/engine/default.py", line 275, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/buildapi/lib/python2.6/site-packages/MySQLdb/__init__.py", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/home/buildapi/lib/python2.6/site-packages/MySQLdb/connections.py", line 187, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
sqlalchemy.exc.OperationalError: (OperationalError) (2005, "Unknown MySQL server host 'buildbot-ro-vip.db.scl3.mozilla.com' (1)") None None
Removing PID file /home/buildapi/buildapi.pid
2013-09-30 14:11:36,046 INFO  [sqlalchemy.engine.base.Engine] [MainThread] SELECT DATABASE()
Trees reopened again at 2013-09-30T14:51:07 for lack any more issues for 30 minutes or so...

Guess we'll be back when it happens again?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This auto-fixed at 14:11:36.
Resolution: FIXED → DUPLICATE
Blocks: 926246
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.