542624 - Need Socorro integration with Hadoop crash report storage to be able to retrieve crash reports

Assignee

Description

•

14 years ago

The flip side to bug 538206 storing crash reports in Hadoop is that Socorro needs to be able to pull them back out.
Once this is fully operational, we'll be able to drastically cut back storage of crash reports on the NFS server.

The Python code attached to bug 538206 contains two retrieval APIs, one that takes an OOID and another that takes a date range.

Please let me know what other requirements you might need in order to mimic the current way you interact with the NFS server to pull crash reports out.

Austin King [:ozten]

Comment 1

•

14 years ago

Currently the processed crash files are read by Apache and served up to:
* web browers
* our PHP app via curl requests

Our app does not read these files via a filesystem.

The curl request is:
http://crash-stats.mozilla.com/dumps/<UUID>.jsonz

IT has control over this url and how it's served up.

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 2

•

14 years ago

processed crash files are a separate piece of this.

In order to get the processed crash files, the processor must retrieve the .json and .dump files.  That is the piece that needs to be extended to be able to retrieve them via a call to Hadoop instead of off of the NFS mount.

That said, it is a very important point that the output of the processor needs a place to live as well.  That would mean that the processor would probably want to retrieve a crash report, process it, then update the crash report in Hadoop with the .jsonz data.

Austin King [:ozten]

Comment 3

•

14 years ago

Yes, sorry I wasn't clear. Comment #1 is a dependency for moving completely off a traditional filesystem.

Austin King [:ozten]

Updated

•

14 years ago

Blocks: 543759

Laura Thomson :laura

Comment 4

•

14 years ago

-> pythonic middleware

Version: 1.x → 1.7

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 5

•

14 years ago

Delivery of Socorro 1.7 is accommodating retrieval of all three critical pieces of data:
meta_data:json (the original submitted json)
raw_data:dump (the minidump binary)
processed_data:json (the "jsonz" file)

We need to make sure that the loose ends are tied up however.. maybe some blocking or depends bugs on this one?

The PHP app has no need to retrieve the original meta_data:json or the raw_data:dump, correct?  Currently, code is written in the monitor and processors to retrieve that data.

We need to make sure that calls to http://crash-stats.mozilla.com/dumps/<UUID>.jsonz are updated with the 1.7 push to retrieve the processed_data:json string from HBase.  This is currently possible by using the Python layer to invoke the method get_processed_json_as_string(ooid).

Assignee: nobody → deinspanjer

Target Milestone: Future → 1.7

Austin King [:ozten]

Comment 6

•

14 years ago

(In reply to comment #5)

> The PHP app has no need to retrieve the original meta_data:json or the
> raw_data:dump, correct?  Currently, code is written in the monitor and
> processors to retrieve that data.

If a user is authorized, they can access the original metadata and raw_data files via Apache. We should continue to support this.

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 7

•

14 years ago

Okay, then we need to ensure the pythonic middleware supports calls to get_json_meta_as_string(ooid) and get_dump(ooid)

Please note that the more I type these method names the more I think we should have better names for them in hbaseClient.py. :)

Lars, how hairy would a cleanup refactoring be? Could we determine official names for these important methods and check in the code by tomorrow's code freeze?

K Lars Lohn [:lars] [:klohn]

Comment 8

•

14 years ago

cleanup refactoring would not be difficult and I highly encourage it.

BTW, earlier this afternoon, I checked in routines for the pythonic middleware that fetch things from hbase:

.../201005/crash/meta/by/uuid/4c0a21db-aeb8-4f5b-8fea-36a402100512
.../201005/crash/raw_crash/by/uuid/4c0a21db-aeb8-4f5b-8fea-36a402100512
.../201005/crash/processed/by/uuid/4c0a21db-aeb8-4f5b-8fea-36a402100512

I just haven't documented it yet.

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Updated

•

14 years ago

Blocks: 565692

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Updated

•

14 years ago

No longer blocks: 565692

K Lars Lohn [:lars] [:klohn]

Updated

•

14 years ago

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

Bugzilla

Quick Search

Need Socorro integration with Hadoop crash report storage to be able to retrieve crash reports

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: dre, Assigned: dre)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated

Updated