Closed Bug 822372 Opened 12 years ago Closed 11 years ago

[stage] ADU injection failed on 12/17

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: laura, Unassigned)

Details

It was retried and succeeded later, but I'd like to know the cause.  Log info from the Metrics side is as follows:

2012/12/16 20:52:11 - Export to Socorro - Loading transformation from XML file [file:///opt/pentaho/kettle/etl/logProcessing/etl/timestamped_log_manager/../socorro/extract_adu.ktr]
2012/12/16 20:52:11 - extract_adu - Dispatching started for transformation [extract_adu]
2012/12/16 20:52:11 - Table output.0 - Connected to database [socorro] (commit=5000)
2012/12/16 20:52:11 - Delete adu records.0 - Finished reading query, closing connection.
2012/12/16 20:52:11 - Log deletes.0 - Optimization level not specified.  Using default of 9.
2012/12/16 20:52:11 - Delete adu records.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)
2012/12/16 20:52:11 - Log deletes.0 - Deleted 0 records from socorro-stage-zlb.db.phx1.mozilla.com.breakpad.raw_adu for date = '2012-12-16'
2012/12/16 20:52:11 - Log deletes.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Because of an error, this step can't continue:
2012/12/16 20:58:12 - extract_adu - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Errors detected!
2012/12/16 20:58:12 - extract_adu - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Errors detected!
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : org.pentaho.di.core.exception.KettleException:
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Error inserting row into table [raw_adu] with values: [2012/12/16 00:00:00.000], [ec8030f7-c20a-464f-9b0e-13a3a9e97384], [Firefox], [13.0], [Linux], [20120530124445], [beta], [1]
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Error inserting/updating row
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : An I/O error occured while sending to the backend.
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:445)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:128)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at java.lang.Thread.run(Thread.java:662)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Caused by: org.pentaho.di.core.exception.KettleDatabaseException:
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Error inserting/updating row
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : An I/O error occured while sending to the backend.
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.pentaho.di.core.database.Database.insertRow(Database.java:1413)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:284)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         ... 3 more
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Caused by: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:283)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:498)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:332)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.pentaho.di.core.database.Database.insertRow(Database.java:1362)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         ... 4 more
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Caused by: java.io.EOFException
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:261)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1635)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
2012/12/16 20:58:12 - Table output.0 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) :         ... 8 more
2012/12/16 20:58:12 - Count inserts.0 - Finished processing (I=0, O=0, R=18192, W=0, U=0, E=0)
2012/12/16 20:58:12 - Table output.0 - Finished processing (I=0, O=18192, R=18193, W=18192, U=0, E=1)
2012/12/16 20:58:12 - extract_adu - extract_adu
2012/12/16 20:58:12 - extract_adu - extract_adu
eod_socorro_20121216_205211.log
Source:  etl1.metrics.scl3.mozilla.com
Destination: socorro-stage-zlb.db.phx1.mozilla.com:5432
(backend zeus cluster)
Assignee: nobody → dmaher
Component: Database → Server Operations: Web Operations
Product: Socorro → mozilla.org
QA Contact: nmaul
Version: unspecified → other
Assignee: dmaher → server-ops-webops
This is old enough now that I suspect it will be more or less impossible to nail down precisely what went wrong. Things have changed, logs are rotated, etc.

The one thing I can pull out of that is a few lines like this:

Caused by: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.

I suspect this is actually a network/connection rather than disk I/O. Beyond that though, I don't have a good idea on where to look.

I note that the timing is almost exactly 6 minutes since the last "good" line and the first error. I suspect a timeout somewhere, but I don't know where it might be... Zeus has these timeouts on this VIP / virtual server disabled. Could be on the backend servers, maybe. In any case, if this is the issue (just a guess), the best course of action is for this job to intelligently disconnect when done with database work for a while, then reconnect when necessary.

If Metrics can increase the log verbosity on their side, it might help to track this down. In the meantime, I don't think there's anything more we can feasibly do here, so I'm going to close this out.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.