Closed
Bug 1450683
Opened 8 years ago
Closed 8 years ago
[traceback] IntegrityError: new row for relation "raw_crashes_20180326" violates check constraint "raw_crashes_20180326_date_check"
Categories
(Socorro :: Database, task, P2)
Socorro
Database
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Assigned: willkg)
Details
Sentry: https://sentry.prod.mozaws.net/operations/socorro-new-prod/issues/1251777/
"""
IntegrityError: new row for relation "raw_crashes_20180326" violates check constraint "raw_crashes_20180326_date_check"
DETAIL: Failing row contains (739d26f2-fe72-4e99-8ef0-5448f0180329, {"TotalPageFile": "7311036416", "ContentSandboxEnabled": "1", "I..., 2018-04-02 13:51:03.394379+00).
File "socorro/external/crashstorage_base.py", line 670, in save_raw_and_processed
crash_id
File "socorro/external/statsd/statsd_base.py", line 133, in benchmarker
result = wrapped_attr(*args, **kwargs)
File "socorro/external/crashstorage_base.py", line 297, in save_raw_and_processed
self.save_raw_crash(raw_crash, dumps, crash_id)
File "socorro/external/postgresql/crashstorage.py", line 285, in save_raw_crash
self.transaction(self._save_raw_crash_transaction, raw_crash, crash_id)
File "socorro/database/transaction_executor.py", line 105, in __call__
result = function(connection, *args, **kwargs)
File "socorro/external/postgresql/crashstorage.py", line 326, in _save_raw_crash_transaction
execute_no_results(connection, upsert_sql, values)
File "socorro/external/postgresql/dbapi2_util.py", line 102, in execute_no_results
a_cursor.execute(sql, parameters)
"""
We periodically had a few of these here and there, but since this morning (April 2nd), we've had 400+ per hour.
This bug covers looking into this issue.
| Assignee | ||
Comment 1•8 years ago
|
||
400/hour is curious. Socorro processed 13,000 in the last hour, so this is a small percentage of the crashes processed.
Grabbing this to look into.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
| Assignee | ||
Comment 2•8 years ago
|
||
I can reproduce this locally, so I doubt this is a problem with -new-prod. So that's cool.
I'm putting this in my list of things to look into today.
Priority: -- → P2
| Assignee | ||
Comment 3•8 years ago
|
||
So weekly-create-partitions creates a raw_crashes_20180326 defined like this:
breakpad=# \d raw_crashes_20180326
Table "public.raw_crashes_20180326"
Column | Type | Modifiers
----------------+--------------------------+-----------
uuid | uuid | not null
raw_crash | json | not null
date_processed | timestamp with time zone |
Indexes:
"raw_crashes_20180326_uuid" UNIQUE, btree (uuid)
"raw_crashes_20180326_date_processed" btree (date_processed)
Check constraints:
"raw_crashes_20180326_date_check" CHECK (date_processed >= '2018-03-26 00:00:00+00'::timestamp with time zone AND date_processed <= '2018-04-02 00:00:00+00'::timestamp with time zone)
Inherits: raw_crashes
The submitted_timestamp of these crashes is bouncing off of that. I claim that it's getting put in the wrong table--it should be getting put into the raw_crashes_20180402 table which looks like this:
breakpad=# \d raw_crashes_20180402
Table "public.raw_crashes_20180402"
Column | Type | Modifiers
----------------+--------------------------+-----------
uuid | uuid | not null
raw_crash | json | not null
date_processed | timestamp with time zone |
Indexes:
"raw_crashes_20180402_uuid" UNIQUE, btree (uuid)
"raw_crashes_20180402_date_processed" btree (date_processed)
Check constraints:
"raw_crashes_20180402_date_check" CHECK (date_processed >= '2018-04-02 00:00:00+00'::timestamp with time zone AND date_processed <= '2018-04-09 00:00:00+00'::timestamp with time zone)
Inherits: raw_crashes
I tried another recent crash and that saved just fine.
I'll look into why these crashes are getting put in the wrong table.
| Assignee | ||
Comment 4•8 years ago
|
||
Aha! We've got a set of crashes (don't know how many, yet) where the crash id ends in 180329 but the submitted_timestamp is for 2018-04-02.
So the problem is that the processor determines which table to put the crash in via the crash id, but that table has a restriction from the submitted_timestamp.
Theoretically, the two should be in sync because the collector creates the crash id and sets the submitted_timestamp at the same time.
Looks like the -stage-submitter in old prod is submitting these crashes over and over and over again:
422137c8-7440-4456-b1bf-1d6a70180329
63b79ce5-c1b4-495d-8fba-eb65d0180329
7c954ce8-df13-4490-95f5-43b8c0180329
60878a11-0eeb-4599-9ca7-9922d0180329
e819ac66-cca5-416d-b8c9-342410180329
859a598b-c478-4e7b-bdb1-672860180329
4c451bb3-7cb9-49aa-bde2-3fffb0180329
13cc62d6-a38b-4527-aa1c-b0e600180329
739d26f2-fe72-4e99-8ef0-5448f0180329
I wonder if it's been busted all this time where it "finishes", but it actually hasn't acked those items in the queue, yet, and then shuts down. So since those items weren't acked, next time it starts up, it redoes them. Then it breaks all the laws of physics going forward.
I shut off the stage submitter for now. We don't need it anyhow--it's not doing anything. I'll see if the errors fall off in Sentry. If they do--that's cool. If not, then we need to figure out how to fix the current issue and then move on to fixing the "how'd we end up in this situation?" issue.
| Assignee | ||
Comment 5•8 years ago
|
||
Shutting off the stage submitter stopped the flow of IntegrityErrors. Yay!
Given that, I'm going to mark this FIXED. I'll add a note to bug #1447412.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•