Closed Bug 1023867 Opened 10 years ago Closed 10 years ago

Failure on 'integer out of range' on duplicates checker - causing most matviews to fail (reports-clean depends on this)

Categories

(Socorro :: Database, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: selenamarie, Assigned: selenamarie)

Details

(Whiteboard: [DB change])

Attachments

(1 file)

Woke to this alert: 05:42 < nagios-phx1> | Wed 05:42:31 PDT [1398] sp-admin01.phx1.mozilla.com:Socorro Admin - crontab is CRITICAL: CRITICAL - duplicates (DuplicatesCronApp) (http://m.mozilla.org/Socorro+Admin+-+crontab) Had been occurring since: 22:55 < nagios-phx1> | Tue 22:55:53 PDT [1135] sp-admin01.phx1.mozilla.com:Socorro Admin - crontab is CRITICAL: CRITICAL - duplicates (DuplicatesCronApp) (http://m.mozilla.org/Socorro+Admin+-+crontab) Initial triage indicates a previously un-checked input field, although unsure exactly which one at this point: 2014-06-11 06:35:07,186 DEBUG - MainThread - error when running <class 'socorro.cron.jobs.matviews.DuplicatesCronApp'> on None Traceback (most recent call last): File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/crontabber/app.py", line 973, in _run_one for last_success in self._run_job(job_class, config, info): File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/crontabber/base.py", line 199, in main function(when) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/crontabber/mixins.py", line 150, in _run_proxy **kwargs File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/crontabber/transaction_executor.py", line 46, in __call__ result = function(connection, *args, **kwargs) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 174, in run self.run_proc(connection, [start_time, end_time]) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 28, in run_proc cursor.callproc(self.get_proc_name(), signature) DataError: integer out of range CONTEXT: SQL statement "create temporary table new_reports_duplicates on commit drop as select follower.uuid as uuid, leader.uuid as duplicate_of, follower.date_processed from ( select uuid, install_age, uptime, client_crash_date, date_processed, first_value(uuid) over ( partition by product, version, build, signature, cpu_name, cpu_info, os_name, os_version, address, topmost_filenames, reason, app_notes, url order by client_crash_date, uuid ) as leader_uuid from reports where date_processed BETWEEN start_time AND end_time ) as follower JOIN ( select uuid, install_age, uptime, client_crash_date FROM reports where date_processed BETWEEN start_time AND end_time ) as leader ON follower.leader_uuid = leader.uuid WHERE ( same_time_fuzzy(leader.client_crash_date, follower.client_crash_date, leader.uptime, follower.uptime) OR follower.uptime < 60 ) AND same_time_fuzzy(leader.client_crash_date, follower.client_crash_date, leader.install_age, follower.install_age) AND follower.uuid <> leader.uuid" PL/pgSQL function update_reports_duplicates(timestamp with time zone,timestamp with time zone) line 10 at SQL statement
Assignee: nobody → sdeckelmann
Specific query that started the failure is: SELECT * FROM update_reports_duplicates('2014-06-11T01:28:19.183999+00:00'::timestamptz,'2014-06-11T02:28:19.183999+00:00'::timestamptz)
Found it! We're allowing subtraction on large negative and positive values in the 'install_age' fields. We should detect the out of bounds error *before* the subtraction to fix this. :) Here's how I found the error: create table selena_new_reports_duplicates as select follower.uuid as uuid, leader.uuid as duplicate_of, follower.date_processed from ( select uuid, install_age, uptime, client_crash_date, date_processed, first_value(uuid) over ( partition by product, version, build, signature, cpu_name, cpu_info, os_name, os_version, address, topmost_filenames, reason, app_notes, url order by client_crash_date, uuid ) as leader_uuid from reports where date_processed BETWEEN 2014-06-11T01:28:19.183999+00:00'::timestamptz AND 2014-06-11T02:28:19.183999+00:00'::timestamptz ) as follower JOIN ( select uuid, install_age, uptime, client_crash_date FROM reports where date_processed BETWEEN 2014-06-11T01:28:19.183999+00:00'::timestamptz AND 2014-06-11T02:28:19.183999+00:00'::timestamptz ) as leader ON follower.leader_uuid = leader.uuid WHERE ( same_time_fuzzy(leader.client_crash_date, follower.client_crash_date, leader.uptime, follower.uptime) OR follower.uptime < 60 ) AND same_time_fuzzy(leader.client_crash_date, follower.client_crash_date, leader.install_age, follower.install_age) AND follower.uuid <> leader.uuid DO language plpgsql $$ DECLARE r record; temp boolean; BEGIN for r in select * from selena_new_reports_duplicates LOOP RAISE NOTICE 'install ages: leader (%), follower (%)', r.linstall_age, r.finstall_age; SELECT same_time_fuzzy(r.ldate_processed, r.fdate_processed, r.linstall_age, r.finstall_age) INTO temp; END LOOP; END$$;
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/ff5bddd62b565d107e7dadb5c6dad4761de45fe5 Fixes bug 1023867 - stop treating negative install_age as valid https://github.com/mozilla/socorro/commit/f0579ebf28141df9aef69496572a3c864e2a29b3 Migration for bug 1023867 https://github.com/mozilla/socorro/commit/0f1d3886a70162ad933701e5d86cabb8381438cc Merge pull request #2104 from selenamarie/bug1023867-duplicates-integer-problem Fixes bug 1023867 - stop treating negative install_age as valid
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89
Whiteboard: [DB change]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: