Closed
Bug 1144861
Opened 9 years ago
Closed 7 years ago
raw_crashes data appears to have incorrect encoding on stage
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: selenamarie, Unassigned)
Details
Both of the following queries were run on Socorro's stage database. This data was originally from production and part of the weekly 'stage refresh': breakpad=# select rc.uuid, json(rc.raw_crash->'dump_checksums'), date_processed from raw_crashes rc where date_processed > '2015-03-15' limit 10 ; uuid | json | date_processed --------------------------------------+-------------------------------------------------------------------------------------------------------------------+------------------------------- ed14476e-21d3-4536-a92c-9d3bf2150315 | {"upload_file_minidump": "6c5a6b5921880c321288133961c9667e"} | 2015-03-15 00:00:00.292512+00 fc14a725-b36e-413c-8151-57fb92150315 | {"upload_file_minidump": "22df911851ade1460117f70624f1a3d0", "memory_report": "f860f3d6fc569d4acf297125f061d4e3"} | 2015-03-15 00:00:00.343784+00 b7617932-4b0d-4ee8-bcba-67ed52150315 | {"upload_file_minidump": "e8ed78662ce75fc72ce3d1d6dc8fc108"} | 2015-03-15 00:00:01.061145+00 3601d825-f82a-4044-8cc5-c47272150315 | {"upload_file_minidump": "b6fd7d62f3411d3e992e7077576b93bf"} | 2015-03-15 00:00:01.300166+00 998067d6-95ce-401f-854f-5b3eb2150315 | {"upload_file_minidump": "2b215113b6bdb287f04a5c383e8a0e9a"} | 2015-03-15 00:00:01.473124+00 0e116b54-c7d0-48b8-859e-dc4762150315 | {"upload_file_minidump": "d14a301fdb7a2871b464ece1af9d32f3"} | 2015-03-15 00:00:02.598883+00 ce3f540c-8195-402e-b088-8aeff2150315 | {"upload_file_minidump": "2f85ddfe44c5c601089e7a9fd4837219"} | 2015-03-15 00:00:02.994641+00 defe047e-e61f-4434-8547-131e32150315 | {"upload_file_minidump": "6e449b6f41603c809162c0305dadc1d8"} | 2015-03-15 00:00:03.48649+00 8f011485-fa47-4c24-adf7-10eaf2150315 | {"upload_file_minidump": "dbe69780f60e9c84e0c00441196762be"} | 2015-03-15 00:00:03.927942+00 2fa3bba8-87de-4077-9483-719482150315 | {"upload_file_minidump": "1a117984fe51bcced6d8345bd62c30c2", "memory_report": "40af22b61086b0fcf52d32e65f40a4d1"} | 2015-03-15 00:00:04.012292+00 (10 rows) This data was inserted by the stage processors, from data submitted with our submitter_app.py: breakpad=# select rc.uuid, json(rc.raw_crash->'dump_checksums'), date_processed from raw_crashes rc where date_processed > '2015-03-18' limit 10 ; uuid | json | date_processed --------------------------------------+------------------------------------------------------------------+------------------------------- e923ccbe-b7f0-4210-88cd-cff3e2150317 | "{u'upload_file_minidump': u'b2caa96d06bc06bcad4ed42cce336831'}" | 2015-03-18 00:00:03.369622+00 40d5ccde-819e-4541-8a6b-04de52150317 | "{u'upload_file_minidump': u'd41d8cd98f00b204e9800998ecf8427e'}" | 2015-03-18 00:00:03.731346+00 b42a5384-8750-4cb6-baa0-7d6232150317 | "{u'upload_file_minidump': u'c06b6ac885b38f8025bbc43927aba6dd'}" | 2015-03-18 00:00:04.577034+00 ed79dda7-ebf3-4fae-9f93-d58da2150317 | "{u'upload_file_minidump': u'dce8cc7d8857b5a9de8bd74c7839396f'}" | 2015-03-18 00:00:04.909234+00 2aef1c25-d0f8-4e6b-864a-f13a32150317 | "{u'upload_file_minidump': u'd7b5952271e57d1c09ec16c01f9278fa'}" | 2015-03-18 00:00:05.038086+00 f5554683-fb43-4043-9ee6-df9192150317 | "{u'upload_file_minidump': u'9c9127b1c547a10ef7ffd20eb355c56e'}" | 2015-03-18 00:00:05.264665+00 22661c49-b443-4481-b686-a2cf72150317 | "{u'upload_file_minidump': u'3a06794e2a3745d0e6cfe26a1e544369'}" | 2015-03-18 00:00:05.521126+00 dc19de4e-22e0-4ab2-9c6d-fdf872150317 | "{u'upload_file_minidump': u'3f852628536cae9f4f505e683f12e2de'}" | 2015-03-18 00:00:05.808538+00 1fa0082b-610a-46cb-b8fd-035c62150317 | "{u'upload_file_minidump': u'a0f25b0b536c0dd558b0c9d2cd54b923'}" | 2015-03-18 00:00:05.966864+00 60fa8823-e8ab-4c1a-a316-b96e12150317 | "{u'upload_file_minidump': u'b7727fa3aeb17a5106c4799300be0622'}" | 2015-03-18 00:00:06.037547+00 (10 rows) breakpad=# You can see that the second set of data is wrapped in a python UTF8 cast. Both prod/stage databases appear to be configured the same (9.2.9 postgres, locale is UTF8) and the stage processors have the correct locale setting. We haven't updated psycopg2 in a while. My guess is that the submitter is doing something that's causing the crash to be interpreted as text instead of json by psycopg2 or Postgres.
Comment 1•9 years ago
|
||
submitter_app sends data in the same manner as the clients do, individual fields via HTTP POST. The json is is not sent as a whole piece. The collector is the process that assembles the raw crash into a json form. it is the processor that saves the raw crash to postgres. See socorro/external/postgresql/crashstorage.py:148. On that line the raw_crash mapping is serialized into a json string form to be sent to the 'raw_crashes' table. One would think that they'd always be represented as strings in the 'raw_crashes' table.
Reporter | ||
Comment 2•9 years ago
|
||
:lars is maybe the issue in this somewhere: socorro/collector/submitter_app.py:61 return DotDict(json.load(raw_crash_fp)) vs socorro/external/boto/crashstorage.py:243 return json.loads(raw_crash_as_string, object_hook=DotDict) Maybe there's a difference between the string we're storing in S3 and what is/was on disk? Although that should be functionally the same?
Comment 3•7 years ago
|
||
no longer a concern
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•