Closed Bug 1450683 Opened 7 years ago Closed 7 years ago

[traceback] IntegrityError: new row for relation "raw_crashes_20180326" violates check constraint "raw_crashes_20180326_date_check"

Categories

(Socorro :: Database, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Sentry: https://sentry.prod.mozaws.net/operations/socorro-new-prod/issues/1251777/ """ IntegrityError: new row for relation "raw_crashes_20180326" violates check constraint "raw_crashes_20180326_date_check" DETAIL: Failing row contains (739d26f2-fe72-4e99-8ef0-5448f0180329, {"TotalPageFile": "7311036416", "ContentSandboxEnabled": "1", "I..., 2018-04-02 13:51:03.394379+00). File "socorro/external/crashstorage_base.py", line 670, in save_raw_and_processed crash_id File "socorro/external/statsd/statsd_base.py", line 133, in benchmarker result = wrapped_attr(*args, **kwargs) File "socorro/external/crashstorage_base.py", line 297, in save_raw_and_processed self.save_raw_crash(raw_crash, dumps, crash_id) File "socorro/external/postgresql/crashstorage.py", line 285, in save_raw_crash self.transaction(self._save_raw_crash_transaction, raw_crash, crash_id) File "socorro/database/transaction_executor.py", line 105, in __call__ result = function(connection, *args, **kwargs) File "socorro/external/postgresql/crashstorage.py", line 326, in _save_raw_crash_transaction execute_no_results(connection, upsert_sql, values) File "socorro/external/postgresql/dbapi2_util.py", line 102, in execute_no_results a_cursor.execute(sql, parameters) """ We periodically had a few of these here and there, but since this morning (April 2nd), we've had 400+ per hour. This bug covers looking into this issue.
400/hour is curious. Socorro processed 13,000 in the last hour, so this is a small percentage of the crashes processed. Grabbing this to look into.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
I can reproduce this locally, so I doubt this is a problem with -new-prod. So that's cool. I'm putting this in my list of things to look into today.
Priority: -- → P2
So weekly-create-partitions creates a raw_crashes_20180326 defined like this: breakpad=# \d raw_crashes_20180326 Table "public.raw_crashes_20180326" Column | Type | Modifiers ----------------+--------------------------+----------- uuid | uuid | not null raw_crash | json | not null date_processed | timestamp with time zone | Indexes: "raw_crashes_20180326_uuid" UNIQUE, btree (uuid) "raw_crashes_20180326_date_processed" btree (date_processed) Check constraints: "raw_crashes_20180326_date_check" CHECK (date_processed >= '2018-03-26 00:00:00+00'::timestamp with time zone AND date_processed <= '2018-04-02 00:00:00+00'::timestamp with time zone) Inherits: raw_crashes The submitted_timestamp of these crashes is bouncing off of that. I claim that it's getting put in the wrong table--it should be getting put into the raw_crashes_20180402 table which looks like this: breakpad=# \d raw_crashes_20180402 Table "public.raw_crashes_20180402" Column | Type | Modifiers ----------------+--------------------------+----------- uuid | uuid | not null raw_crash | json | not null date_processed | timestamp with time zone | Indexes: "raw_crashes_20180402_uuid" UNIQUE, btree (uuid) "raw_crashes_20180402_date_processed" btree (date_processed) Check constraints: "raw_crashes_20180402_date_check" CHECK (date_processed >= '2018-04-02 00:00:00+00'::timestamp with time zone AND date_processed <= '2018-04-09 00:00:00+00'::timestamp with time zone) Inherits: raw_crashes I tried another recent crash and that saved just fine. I'll look into why these crashes are getting put in the wrong table.
Aha! We've got a set of crashes (don't know how many, yet) where the crash id ends in 180329 but the submitted_timestamp is for 2018-04-02. So the problem is that the processor determines which table to put the crash in via the crash id, but that table has a restriction from the submitted_timestamp. Theoretically, the two should be in sync because the collector creates the crash id and sets the submitted_timestamp at the same time. Looks like the -stage-submitter in old prod is submitting these crashes over and over and over again: 422137c8-7440-4456-b1bf-1d6a70180329 63b79ce5-c1b4-495d-8fba-eb65d0180329 7c954ce8-df13-4490-95f5-43b8c0180329 60878a11-0eeb-4599-9ca7-9922d0180329 e819ac66-cca5-416d-b8c9-342410180329 859a598b-c478-4e7b-bdb1-672860180329 4c451bb3-7cb9-49aa-bde2-3fffb0180329 13cc62d6-a38b-4527-aa1c-b0e600180329 739d26f2-fe72-4e99-8ef0-5448f0180329 I wonder if it's been busted all this time where it "finishes", but it actually hasn't acked those items in the queue, yet, and then shuts down. So since those items weren't acked, next time it starts up, it redoes them. Then it breaks all the laws of physics going forward. I shut off the stage submitter for now. We don't need it anyhow--it's not doing anything. I'll see if the errors fall off in Sentry. If they do--that's cool. If not, then we need to figure out how to fix the current issue and then move on to fixing the "how'd we end up in this situation?" issue.
Shutting off the stage submitter stopped the flow of IntegrityErrors. Yay! Given that, I'm going to mark this FIXED. I'll add a note to bug #1447412.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.