Closed Bug 552539 Opened 15 years ago Closed 15 years ago

.csv files missing data up until 0952 in report for 2010 03 14

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: chofmann, Assigned: lars)

Details

first report in the file appears to be CFReadStreamGetStatus http://crash-stats.mozilla.com/report/index/6fae526e-117f-42fe-8553-bb7832100314 and was processed at 201003140953 we should have reports starting around midnight. report for 2010 03 13 and previous days look ok.
at about 9:30am on 3/14, I noticed that crashes were being accepted, but not processed. I raised a red flag and aravind found that somehow the Socorro collectors had come unhitched from their NFS mounts. Strangely, the collectors were not complaining, but continued to accept new crashes. Aravind was not able to determine where the collectors were silently stashing their treasured crash reports. He reconnected the NFS mounts and collectors went on saving crashes in the proper locations. Serendipitously, this was the first weekend that were sending crash data to HBase at 100%. In a case of reversing roles, we can recover all those lost crashes by piping them out of HBase and into NFS with a custom script. I am writing that script now and will give it to aravind to execute. However, it will likely be late Tuesday or Wednesday before we've recovered that data. Once done, we can re-run the dailyUrl script that produces your csv report.
Assignee: nobody → lars
Status: NEW → ASSIGNED
ok, sounds good. thanks for the update. usually data from a random day is not that particularly interesting and we could get by. but in this case I'd like to do some analysis for possible crashes when the system clock time gets shifted out from under firefox to see if we have crashes lurking there. so the shift to daylight savings time that was happening across timezones on sunday morning provides a unique opportunity for lots of clocks to be shifting around in some reliable way.
the last report I see for the 13th was processed at 2010 03 13 23:48 so the problem seem to kick in just before
I'm having some difficulties in getting this script complete. The hbase json 'get' function returns an "unpacked" json object. My calls to this function are failing because what's in hbase is not a valid string json representation and the json module cannot unpack it. When looking closely at what is returned, I find that the string returned looks like a json string, but is actually a pythonic stringified dictionary. These a similar in format, but not interchangeable. Looking at the corresponding 'put' function of the hbase library, I find that it does not do the symmetric json conversion. It seems to be expecting that the calling code has already done the json dict to string conversion. The collector code currently in use is just passing in the json dict. The hbase python code is then inadvertently converting the json dict to a pythonic string and saving it raising no error. This is a prime example where unit testing is not enough. We need some integration testing on this project. I can conceivably make progress on this script and still recover the 'lost' crashes, but it is going to take more time. Mean while, we also need a plan to correct the data in hbase.
Target Milestone: --- → 1.7
assuming that we've gotten on just fine without these lost crashes...
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.