BMO ETL: Bug in export script when processing attachments that cause the script to cache the same attachment ID over and over filling up the cache table
Categories
(bugzilla.mozilla.org :: Infrastructure, task, P1)
Tracking
()
People
(Reporter: dkl, Assigned: dkl)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
When the export script loops over all of the rows in attachments
table, when it gets to ID 8388607 it does not increment to the next attach_id and instead keeps re-entering data into the cache table for that ID over and over til it gets to the end of the attachments
table. This is causing a large amount of broken rows in the cache table and making it grow larger than needed. Need to fix this asap.
select count(id), id from bmo_etl_cache where table_name = 'attachments' group by id order by count(id) desc;
count(id) id
11543674 8388607
1 1
1 2
1 3
1 4
...
Assignee | ||
Comment 1•16 days ago
|
||
More information:
- The modification ts is different and the data is also different so it is just the ID mainly that is staying the same.
id snapshot_date length(data)
8388607 2014-03-10 17:06:33 215
8388607 2014-03-10 17:42:06 204
8388607 2014-03-10 17:42:06 204
8388607 2014-03-10 17:42:06 204
8388607 2014-03-10 17:42:06 204
8388607 2014-03-10 17:42:26 215
8388607 2014-03-10 17:42:26 215
8388607 2014-03-10 17:42:26 215
8388607 2014-03-10 17:42:26 215
8388607 2014-03-10 17:45:06 186
8388607 2014-03-10 17:45:06 186
8388607 2014-03-10 17:45:06 186
8388607 2014-03-10 17:45:06 186
8388607 2014-03-10 17:45:52 187
8388607 2014-03-10 17:45:52 187
8388607 2014-03-10 17:45:52 187
...
-
There are no rows in the cache that have an attachment ID higher than 8388607 so I assume that all additional rows match up to higher rows in the attachment table per modification_ts and data, just the cached attachment ID is not changing.
-
This is also occurring on the same attachment ID on bugzilla-dev so it is not limited to just production.
Description
•