Closed
Bug 749963
Opened 13 years ago
Closed 13 years ago
[socorro] Backfill matviews for 2012-04-27
Categories
(Data & BI Services Team :: DB: MySQL, task)
Data & BI Services Team
DB: MySQL
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kairo, Assigned: mpressman)
References
Details
+++ This bug was initially created as a clone of Bug #749632 +++
Just like yesterday, 2012-04-27 is again missing crash matview data, just like the problem we had yesterday and the day before. Not sure about the actual cause, I'll leave that to the Socorro team to find out. ;-)
In any case, the data needs to be backfilled.
| Reporter | ||
Updated•13 years ago
|
Severity: normal → major
Comment 1•13 years ago
|
||
Same as the past several days, the nightly jobs are hanging:
[rhelmer@sp-admin01.phx1 ~]$ ps -ef | grep python
socorro 15291 15283 0 Apr25 ? 00:00:00 /usr/bin/python2.6 /data/socorro/application/scripts/startDailyMatviews.py
socorro 15632 1 0 Apr25 ? 00:00:00 /usr/bin/python2.6 /data/socorro/application/scripts/startTopCrashesBySignature.py
However I don't see any open queries in the DB, not sure what's going on:
breakpad=> select * from pg_stat_activity where usename = 'breakpad_rw';
datid | datname | procpid | usesysid | usename | application_name | client_addr | client_port | backend_start
| xact_start | query_start | waiting | current_query
-------+----------+---------+----------+-------------+------------------+-------------+-------------+---------------------------
----+-------------------------------+-------------------------------+---------+-------------------------------------------------
--------------
48819 | breakpad | 13415 | 16385 | breakpad_rw | | | -1 | 2012-04-28 22:22:01.841418
+00 | | 2012-04-28 22:49:50.254028+00 | f | <IDLE>
48819 | breakpad | 25878 | 16385 | breakpad_rw | | 10.8.70.209 | 54992 | 2012-04-28 22:47:05.906444
+00 | 2012-04-28 22:49:54.275682+00 | 2012-04-28 22:49:54.275682+00 | f | select * from pg_stat_activity where usename = '
breakpad_rw';
48819 | breakpad | 24721 | 16385 | breakpad_rw | | | -1 | 2012-04-28 22:45:00.670532
+00 | | 2012-04-28 22:49:54.264654+00 | f | <IDLE>
(3 rows)
Severity: major → blocker
Comment 2•13 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #1)
> Same as the past several days, the nightly jobs are hanging:
>
> [rhelmer@sp-admin01.phx1 ~]$ ps -ef | grep python
> socorro 15291 15283 0 Apr25 ? 00:00:00 /usr/bin/python2.6
> /data/socorro/application/scripts/startDailyMatviews.py
> socorro 15632 1 0 Apr25 ? 00:00:00 /usr/bin/python2.6
> /data/socorro/application/scripts/startTopCrashesBySignature.py
Can someone please stop these two processes on sp-admin01 and then run the backfill procedure in https://mana.mozilla.org/wiki/display/websites/Backfilling+Matview+and+ADU+Data+For+Socorro ?
Comment 3•13 years ago
|
||
Some forensics:
* cron_aggregates (old TCBS) logged this:
2012-04-28 15:02:02,000 DEBUG - getProductId
2012-04-28 15:02:02,000 DEBUG - None
2012-04-28 15:02:02,000 DEBUG - trying getDefaultDateInterval
2012-04-28 15:02:02,007 DEBUG - MainThread - Processing with interval from 2012-04-28 12:00:00, size=1:00:00)
2012-04-28 15:03:19,611 DEBUG - Returning 18 items for window [2012-04-28 12:00:00,2012-04-28 13:00:00)
2012-04-28 15:03:19,779 INFO - Successfully processed 18 items in 77.79 seconds
2012-04-28 15:03:19,779 INFO - done.
* cron_daily_matviews (new TCBS and related reports) has logged nothing
Comment 4•13 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #1)
> Same as the past several days, the nightly jobs are hanging:
>
> [rhelmer@sp-admin01.phx1 ~]$ ps -ef | grep python
> socorro 15291 15283 0 Apr25 ? 00:00:00 /usr/bin/python2.6
> /data/socorro/application/scripts/startDailyMatviews.py
> socorro 15632 1 0 Apr25 ? 00:00:00 /usr/bin/python2.6
> /data/socorro/application/scripts/startTopCrashesBySignature.py
Actually looking at this more carefully, I think these were never stopped back on the 25th so the cron lock checker is keeping it from running. We think the problem on the 25th was likely network related, and it's been hanging ever since.
| Assignee | ||
Comment 5•13 years ago
|
||
I have killed the jobs startDailyMatviews.py and startTopCrashesBySignature.py that started on the 25th
Comment 6•13 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #3)
> Some forensics:
>
> * cron_aggregates (old TCBS) logged this:
>
> 2012-04-28 15:02:02,000 DEBUG - getProductId
> 2012-04-28 15:02:02,000 DEBUG - None
> 2012-04-28 15:02:02,000 DEBUG - trying getDefaultDateInterval
> 2012-04-28 15:02:02,007 DEBUG - MainThread - Processing with interval from
> 2012-04-28 12:00:00, size=1:00:00)
> 2012-04-28 15:03:19,611 DEBUG - Returning 18 items for window [2012-04-28
> 12:00:00,2012-04-28 13:00:00)
> 2012-04-28 15:03:19,779 INFO - Successfully processed 18 items in 77.79
> seconds
> 2012-04-28 15:03:19,779 INFO - done.
Ok the theory in comment 4 is making sense, since IIRC dumitru removed the lock file after seeing nagios alerts about it (he let us know in IRC), so this one hasn't been causing problems.
However the daily_matviews has not been running due to the lockfile. We need to make the monitoring better on this.
Updated•13 years ago
|
Assignee: server-ops-database → eziegenhorn
Updated•13 years ago
|
Assignee: eziegenhorn → mpressman
| Assignee | ||
Comment 7•13 years ago
|
||
the backfill for 4/27 has been run and I've kick memcached
| Reporter | ||
Comment 8•13 years ago
|
||
(In reply to Matt Pressman [:mpressman] from comment #7)
> the backfill for 4/27 has been run and I've kick memcached
Yes, data was OK after this, marking so.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•