Closed Bug 1340105 Opened 7 years ago Closed 7 years ago

Crash Reports schema file is not updated on S3

Categories

(Socorro :: Backend, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adrian, Assigned: adrian)

References

Details

We have a crash_report.json JSON schema file in our repo, that controls the data we send to Telemetry, and the data Telemetry will accept from us. On our side, we use our in-code version, but on Telemetry's side, they download that file from our S3 bucket. We have a cron job that is supposed to update that file every hour, but it doesn't seem to be working: the current file is a quite old version missing all the fields we added recently (raw_crash and memory fields). 

Let's figure out what is wrong and fix it.
The file is not up-to-date on our admin node, the server on which our cron jobs run. I suspect it might not receive new code during releases.
Flags: needinfo?(miles)
The admin node definitely doesn't get updated code-wise. None of the long-running nodes do. We should probably change the cron job to pull from github or have the deploy push the file.
The long-running node conundrum is one I would like to fix. I don't know enough about the admin node and its role in the Socorro infra to rebuild it, stage-submitter would be be feasible to do, but the plan thus far has been "redo everything in the great migration."
Flags: needinfo?(miles)
I just updated the -prod admin node from 252 to 263 (the current release). That should fix this immediate issue. Bug #1341755 covers changing things so admin nodes are up-to-date.

Adrian: Can you check if everything is good tomorrow?
Yay, the file is updated in our S3 bucket! Marco, can you confirm that the data is making it to Telemetry?
Flags: needinfo?(mcastelluccio)
The schema was updated, if I run "DESCRIBE socorro_crash" I can see the new fields.

Queries with a crash date <= 20170221 fail now, with the message "Error running query: Error opening Hive split s3://telemetry-parquet/socorro_crash/v1/crash_date=20170220/part-r-00000-8ac03e54-063a-4cbe-a42e-e92a84ee5a29.snappy.parquet (offset=527149590, length=35143306): Schema mismatch, metastore schema for row column json_dump has 12 fields but parquet schema has 11 fields".

For queries with a crash date >= 20170222, the fields that were already present in the previous schema are now always "null", the new fields are empty.
Flags: needinfo?(mcastelluccio)
(In reply to Marco Castelluccio [:marco] from comment #6)
> The schema was updated, if I run "DESCRIBE socorro_crash" I can see the new
> fields.
> 
> Queries with a crash date <= 20170221 fail now, with the message "Error
> running query: Error opening Hive split
> s3://telemetry-parquet/socorro_crash/v1/crash_date=20170220/part-r-00000-
> 8ac03e54-063a-4cbe-a42e-e92a84ee5a29.snappy.parquet (offset=527149590,
> length=35143306): Schema mismatch, metastore schema for row column json_dump
> has 12 fields but parquet schema has 11 fields".
> 
> For queries with a crash date >= 20170222, the fields that were already
> present in the previous schema are now always "null", the new fields are
> empty.

My Hive-fu is poor but I sense two topics going on here. 

1. The update (i.e. telling Telemetry what fields were used to copy from socorro to Telemetry's S3) of the crash_reports.json files is now fixed and corrected and resolved. Right?

2. The column is *there* from >=20170222 but the *value* is always null. That's curious. The procedure is that once a day Telemetry downloads crash_reports.json from the S3 bucket, loops over all .json files in that day's S3 bucket, and inserts them into Spark. (perhaps the right lingo is "inserts them WITH Spark")

If the understand is correct that would mean that we need to ping mreid or amiyaguchi and have them debug what data gets inserted. 

Marco, minding the fact that another day has gone by, are the values still just null on these new fields? Also, which fields are we talking about and can you help us by showing how you debug it?
Flags: needinfo?(mcastelluccio)
See Also: → 1342936
(In reply to Peter Bengtsson [:peterbe] from comment #7)
> Marco, minding the fact that another day has gone by, are the values still
> just null on these new fields? Also, which fields are we talking about and
> can you help us by showing how you debug it?

I'm just running a query like "SELECT * FROM socorro_crash WHERE crash_date = '20170223' LIMIT 10" on https://sql.telemetry.mozilla.org/queries/new.

I think all the fields from the old schema are "null" and all the fields we added in the new schema are empty.
Flags: needinfo?(mcastelluccio)
This is resolved.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.