1340105 - Crash Reports schema file is not updated on S3

Assignee

Description

•

7 years ago

We have a crash_report.json JSON schema file in our repo, that controls the data we send to Telemetry, and the data Telemetry will accept from us. On our side, we use our in-code version, but on Telemetry's side, they download that file from our S3 bucket. We have a cron job that is supposed to update that file every hour, but it doesn't seem to be working: the current file is a quite old version missing all the fields we added recently (raw_crash and memory fields). 

Let's figure out what is wrong and fix it.

[DEACTIVATED] Adrian Gaudebert

Assignee

Comment 1

•

7 years ago

The file is not up-to-date on our admin node, the server on which our cron jobs run. I suspect it might not receive new code during releases.

Flags: needinfo?(miles)

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 2

•

7 years ago

The admin node definitely doesn't get updated code-wise. None of the long-running nodes do. We should probably change the cron job to pull from github or have the deploy push the file.

Miles Crabill [:miles]

Comment 3

•

7 years ago

The long-running node conundrum is one I would like to fix. I don't know enough about the admin node and its role in the Socorro infra to rebuild it, stage-submitter would be be feasible to do, but the plan thus far has been "redo everything in the great migration."

Flags: needinfo?(miles)

Marco Castelluccio [:marco]

Updated

•

7 years ago

Blocks: 1311648

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 4

•

7 years ago

I just updated the -prod admin node from 252 to 263 (the current release). That should fix this immediate issue. Bug #1341755 covers changing things so admin nodes are up-to-date.

Adrian: Can you check if everything is good tomorrow?

[DEACTIVATED] Adrian Gaudebert

Assignee

Comment 5

•

7 years ago

Yay, the file is updated in our S3 bucket! Marco, can you confirm that the data is making it to Telemetry?

Flags: needinfo?(mcastelluccio)

Marco Castelluccio [:marco]

Comment 6

•

7 years ago

The schema was updated, if I run "DESCRIBE socorro_crash" I can see the new fields.

Queries with a crash date <= 20170221 fail now, with the message "Error running query: Error opening Hive split s3://telemetry-parquet/socorro_crash/v1/crash_date=20170220/part-r-00000-8ac03e54-063a-4cbe-a42e-e92a84ee5a29.snappy.parquet (offset=527149590, length=35143306): Schema mismatch, metastore schema for row column json_dump has 12 fields but parquet schema has 11 fields".

For queries with a crash date >= 20170222, the fields that were already present in the previous schema are now always "null", the new fields are empty.

Flags: needinfo?(mcastelluccio)

Peter Bengtsson [:peterbe]

Comment 7

•

7 years ago

(In reply to Marco Castelluccio [:marco] from comment #6)
> The schema was updated, if I run "DESCRIBE socorro_crash" I can see the new
> fields.
> 
> Queries with a crash date <= 20170221 fail now, with the message "Error
> running query: Error opening Hive split
> s3://telemetry-parquet/socorro_crash/v1/crash_date=20170220/part-r-00000-
> 8ac03e54-063a-4cbe-a42e-e92a84ee5a29.snappy.parquet (offset=527149590,
> length=35143306): Schema mismatch, metastore schema for row column json_dump
> has 12 fields but parquet schema has 11 fields".
> 
> For queries with a crash date >= 20170222, the fields that were already
> present in the previous schema are now always "null", the new fields are
> empty.

My Hive-fu is poor but I sense two topics going on here. 

1. The update (i.e. telling Telemetry what fields were used to copy from socorro to Telemetry's S3) of the crash_reports.json files is now fixed and corrected and resolved. Right?

2. The column is *there* from >=20170222 but the *value* is always null. That's curious. The procedure is that once a day Telemetry downloads crash_reports.json from the S3 bucket, loops over all .json files in that day's S3 bucket, and inserts them into Spark. (perhaps the right lingo is "inserts them WITH Spark")

If the understand is correct that would mean that we need to ping mreid or amiyaguchi and have them debug what data gets inserted. 

Marco, minding the fact that another day has gone by, are the values still just null on these new fields? Also, which fields are we talking about and can you help us by showing how you debug it?

Flags: needinfo?(mcastelluccio)

Peter Bengtsson [:peterbe]

Updated

•

7 years ago

Comment 8

•

7 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #7)
> Marco, minding the fact that another day has gone by, are the values still
> just null on these new fields? Also, which fields are we talking about and
> can you help us by showing how you debug it?

I'm just running a query like "SELECT * FROM socorro_crash WHERE crash_date = '20170223' LIMIT 10" on https://sql.telemetry.mozilla.org/queries/new.

I think all the fields from the old schema are "null" and all the fields we added in the new schema are empty.

Flags: needinfo?(mcastelluccio)

[DEACTIVATED] Adrian Gaudebert

Assignee

Comment 9

•

7 years ago

This is resolved.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Bugzilla

Quick Search

Crash Reports schema file is not updated on S3

Categories

(Socorro :: Backend, task)

Tracking

(Not tracked)

People

(Reporter: adrian, Assigned: adrian)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9