Closed Bug 914106 Opened 11 years ago Closed 9 years ago

Add table to store symbols -> build mapping, like current symbols.txt files (for cleanup)

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Assigned: peterbe)

References

Details

(Whiteboard: [symbols])

Currently our symbol store has a symbols.txt file for each build that gets symbols uploaded, like:
http://symbols.mozilla.org/firefox/firefox-18.0.2-WINNT-20130201065344-x86-symbols.txt

The only thing we actually use this for is symbol cleanup. We have a script:
http://hg.mozilla.org/build/tools/file/tip/buildfarm/breakpad/cleanup-breakpad-symbols.py

that gets run on a daily cron to clean up out-of-date symbols. It reads all the symbols.txt files, figures out which builds to delete (by using the build id as a date), and then figures out which symbol files to delete.

The only tricky bit here is that the same symbol file can be referenced by multiple builds, since compiling the same exact source with the same compiler+flags will result in the same binary and same symbol file. The cleanup script works with this by reference-counting all the symbol files, and only deleting them when their refcount drops to zero.

We'll need to preserve this data in the symbol db so that we can run an equivalent cleanup process.
Assignee: nobody → sdeckelmann
Whiteboard: [symbols]
Blocks: 1071724
Blocks: 1085530
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #0)
> 
> We'll need to preserve this data in the symbol db so that we can run an
> equivalent cleanup process.

When you say "symbol db", do you mean this https://crash-stats.mozilla.com/admin/symbols-uploads/ ?
I don't actually have access to view that page, but Selena and I had been talking about building some data store to replace the .txt files. rhelmer suggested that what you have might be the first step.
I feel like I'm jumping in cold into this discussion but on the webapp, we unpack the incoming file and take log of the content of the file by listing its content. 
You can see what we're storing here: https://github.com/mozilla/socorro/blob/9e8fec7d47fc23e51a28f07d4ee12c81c8a2d609/webapp-django/crashstats/symbols/models.py#L27-L33

(Note! When we switch to S3 we're no longer going to record `file` and instead add a new column called `url`)

Perhaps this is the domain where we'd do processing for your cleanup-abilities.
(In reply to Peter Bengtsson [:peterbe] from comment #3)
> I feel like I'm jumping in cold into this discussion but on the webapp, we
> unpack the incoming file and take log of the content of the file by listing
> its content. 
> You can see what we're storing here:
> https://github.com/mozilla/socorro/blob/
> 9e8fec7d47fc23e51a28f07d4ee12c81c8a2d609/webapp-django/crashstats/symbols/
> models.py#L27-L33
> 
> (Note! When we switch to S3 we're no longer going to record `file` and
> instead add a new column called `url`)
> 
> Perhaps this is the domain where we'd do processing for your
> cleanup-abilities.

Yes this is what I was thinking of, I think it has the basic info we need (when the file was uploaded, what the contents where, and who uploaded it).

We should be able to run a query to figure out when files have aged out.
(In reply to Peter Bengtsson [:peterbe] from comment #3)
> I feel like I'm jumping in cold into this discussion but on the webapp, we
> unpack the incoming file and take log of the content of the file by listing
> its content. 
> You can see what we're storing here:
> https://github.com/mozilla/socorro/blob/
> 9e8fec7d47fc23e51a28f07d4ee12c81c8a2d609/webapp-django/crashstats/symbols/
> models.py#L27-L33
> 
> (Note! When we switch to S3 we're no longer going to record `file` and
> instead add a new column called `url`)
> 
> Perhaps this is the domain where we'd do processing for your
> cleanup-abilities.

Thanks, I hadn't seen that! This is definitely what I had in mind. I'm a little worried that that's not an optimal schema for what we need it for, but I think we can cross that bridge when we get there. I'd also like to extend it to store some additional metadata (right now we rely on the bits that are in the *symbols.txt filenames), but we can look into that after we get everything switched over to use the API.
I'm going to call what you've implemented "good enough", and we can file any improvements as followup bugs. Thanks!
Assignee: sdeckelmann → peterbe
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.