If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Switch socorro to store dumps by day.

RESOLVED FIXED in 1.2

Status

Socorro
General
P2
major
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: aravind, Assigned: griswolf)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [crashkill][crashkill-metrics])

(Reporter)

Description

8 years ago
Currently all layers of the stack store and process minidumps in the root/date and root/name hierarchies with cross links between the name and date trees.  I would like to change that so things are now stored in the root/YYYYMMDD/date and root/YYYYMMDD/name hierarchies.  The same symlinks between the two can still exist, but there is probably no reason for the symlinks anymore (I will let you guys decide that).


This will make my life simpler in a few different ways.

1. It will make deletions a lot easier.

2. It will make it trivial to export something like a day of data to other consumers.

3. It gives me the flexibility to split those mount points off into different file systems (or even different backend stores), which makes it easier to grow file system capacity.
This idea has merit. I would modify it slightly because we wouldn't need the whole date hierarchy repeated deeper in the tree, only the time portion would be required.

$ROOT/YYYYMMDD/time/HH/MM/5S/webhead
           .../name/xx/yy/.../xxyyzz...json

Given a uuid to look up, we can still do it rapidly, we've got the YYMMDD at the end.  Looking up by date is also just as easy.

Care would have to be taken in the monitor when looking for new crashes.  The detail will require some thinking...
(Reporter)

Updated

8 years ago
Blocks: 523650
Per Aravind: We want a probable maximum of 1000 files per directory, with a structure no more than 8 or so deep, prefer 5 or under: http://communities.netapp.com/message/5790

A single day from September (covering parts of the 28th and 29th) had a maximum of 372 crashes per 5-minute time slice (median of 296, average of 240). If we allow for a 10X increase, that puts us at about 3K files per 5 minutes (median). The code already allows for a time slice of any number of minutes, so 1 minute slices would be fine for even a 10X increase. The ideal limit of 1K per directory is soft, not hard, so we can get away with a few that are a little bit over.
This blocks 1.9.2 because it blocks (indirectly) a blocker, bug 523528.
Flags: blocking1.9.2?
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
Who owns this?
Whiteboard: [crashkill][crashkill-metrics]
Assignee: nobody → lars
No longer blocks: 523650
Lars kicks ass.  :)
And Frank handles some of the details...
Target Milestone: --- → 1.1
Assignee: lars → griswolf
Code available for review at http://code.google.com/p/socorro/source/browse/#svn/branches/dailystorage. Particularly note:
 .../socorro/lib/dumpStorage.py (new base class with modified layout)
 .../socorro/lib/JsonDumpStorage.py (modified to inherit DumpStorage)
 .../socorro/lib/processedDumpStorage.py (modified to inherit DumpStorage)
 .../socorro/lib/filesystem.py (added things used by the above files)

 .../unittest/lib/(see files above) were also changed or created
(there were a few other changes which I believe were minor)

All this at svn r1437
checked into trunk at Revision: 1450
Per recent phone conversation, IT (aravind) wants this change to be associated with a related change to webapp and apache as follows:

1: webapp stops making direct file system requests and instead routes all such things through apache via some small set of servicepoints

2: apache uses rewrite rules to convert simple http://servicepoint/uuid urls into the appropriate file fetch.
This seems like a lateral change.  How does this help?
(Reporter)

Comment 11

8 years ago
If we don't switch the webapp to just delegate things to apache, the webapp (in the worst case) will have to look at a ton of mount points everytime someone clicks on a report.  Its currently configured to look into two locations for a jsonz file.  That will change to a steadily growing list of mount points (or directories).
Why don't we store filepath as metadata in the reports table?

How will you implement this using Apache rewrite rules? Why not have PHP grab this value from the DB and do a pass-thru.
(Reporter)

Comment 13

8 years ago
(In reply to comment #12)
> Why don't we store filepath as metadata in the reports table?

I don't know why we haven't done that in the past.  It seems like it would make it more rigid if we associate storage pattern with the uuids in the db.  We wouldn't have nearly the same kind of flexibility we have right now in changing the mount points etc.

> How will you implement this using Apache rewrite rules? Why not have PHP grab
> this value from the DB and do a pass-thru.

I was going to implement this with apache rewrite rules using perl regexps.  Depending on how fancy I want to get with it, I might even use a mod_perl script to feed apache two locations etc.

The general idea however is to use a series of regexps (starting with the most likely mount point), and then going down the chain to look into older mount points.

Updated

8 years ago
Depends on: 528087
Pushing to 1.2.
No longer depends on: 528087
Target Milestone: 1.1 → 1.2

Updated

8 years ago
Depends on: 528087
The point, from a systems perspective, is information hiding: The actual location of files is not something the web app should care about. IT may have reasons to move files around, even move them off the server's drives. If IT is responsible for translating a URI into a file fetch, then the webapp can use the same URI to fetch the same file every time, and never know the difference. Loosely coupled (when feasible) is a good thing.

Perl regex may be overkill for this particular use (substrings with fixed indexes are sufficient), but "I don't need to know".
When are you going to push this?  Are we waiting until 1.2 launches or should we do it before then?
(Reporter)

Comment 17

8 years ago
(In reply to comment #16)
> When are you going to push this?  Are we waiting until 1.2 launches or should
> we do it before then?

If that question was directed at me, I am waiting for Austin to confirm that the reporting webapp no longer depends on scanning the filesystem looking for processed jsonz files.  Once that part is ready we can push this to staging.  I am not sure about how this conflicts or changes the work needed for 1.2.
(In reply to comment #17)
No, the webapp will continue access the filesystem.

Yes, I forgot we need to make a simple 1 line change to the PHP code per Bug#528087.
(In reply to comment #17)
@aravind do you want a bug to track your Rewrite / Perl work for updating the /dumps/ urls?
Throwing off the boat.  -'ing.
Flags: blocking1.9.2+ → blocking1.9.2-
Totally the right choice - we already have the dumps, it's just implemented hackily AIUI.
Awaiting deployment
Status: NEW → ASSIGNED
This is now in production
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.