Closed Bug 642167 Opened 11 years ago Closed 8 years ago

upload minidump *.dmp files to well known location

Categories

(Release Engineering :: General, defect, P5)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: catlee)

References

Details

(Whiteboard: [unittest][ftp][logs])

Once bug#544062 is fixed, we should set that env. variable so that *.dmp files are created in a well-known location - like the same directory as the rest of the test results. 

At the end of the test, when uploading log files, we should search for and upload any *.dmp files. These files are only created if there is a crash during the test run, so if we find any, we should upload them all.
Blocks: 495464
No longer depends on: 544062
Whiteboard: [unittest][ftp][logs]
Priority: -- → P5
No longer blocks: 495464
This is hard because we've designed our test pool to not be able to write to files on ftp. If we did this, we would have to split up our test pool into try and non-try pools, or somehow do access control based on the type of job being run...
That is unfortunate. :-/ Any thoughts on how we could work around this? Subvert it by making the stackwalk.cgi send them somewhere?
Notes from today's mtg w/taras and joduinn: 

* does not have to be ftp. Uploading to another well known location, like a separate s3 share,  would be equally fine. Specifically not uploading to ftp.m.o helps unblock RelEng because:
** using a separate share (not ftp) avoids permission and security concerns with our one shared test pool which intentionally do not have write access to ftp.m.o.
** space on ftp.m.o is a concern, so creating this on s3 avoids further space-crunch on ftp.m.o
** need to file dep bugs for creating this share
** tweaking summary to match.

* maybe as try-only, with a mozconfig setting?
*** taras and joduinn to chat with their groups to see if non-default on try is sufficient

* each run generates typically 50mb, but can go up to 1gb.
** could be run on any given test suite, but human would most likely only ever run on one suite for a given build. joduinn was concerned about generating this amount of data for *each* test suite run per build... thats a big storage impact.

* retention of .dmp files: 
*** do not need this run on every build, or even on every nightly. Probably only run when human developer sees need, maybe once a month? every X days?
*** dont need to keep .dmp files / data for long. can safely delete after 1-2 business days
Component: Release Engineering → Release Engineering: Developer Tools
QA Contact: hwine
Summary: upload any minidump *.dmp files to ftp.m.o alongside test results and logs → upload minidump *.dmp files to well known location
This sounds good. It will work for our current usecases of [SPS] profiling and xperf dumps.
Thanks for looking into this!

(In reply to John O'Duinn [:joduinn] from comment #3)
> Notes from today's mtg w/taras and joduinn: 
> 
> * does not have to be ftp. Uploading to another well known location, like a
> separate s3 share,  would be equally fine. Specifically not uploading to
> ftp.m.o helps unblock RelEng because:

This is totally fine, as long as the files are accessible and we can easily wire support for finding them into TBPL.

> * maybe as try-only, with a mozconfig setting?
> *** taras and joduinn to chat with their groups to see if non-default on try
> is sufficient

Try-only would be okay as a first-phase rollout, but ideally we really do want this on all branches, because it's not uncommon to hit crashes on all trees that aren't very actionable.

> * each run generates typically 50mb, but can go up to 1gb.

Not sure where this number comes from? In the case of minidump files a normal run does not generate any. Only for test failures due to crashes do we generate any, and normal minidumps are very small (100-200KB).

> * retention of .dmp files: 
> *** do not need this run on every build, or even on every nightly. Probably
> only run when human developer sees need, maybe once a month? every X days?

I'm not sure what "don't need to run this on every build" means. We'd only generate the dumps in exceptional circumstances (test failures where we had a crash), and we'd want to save them every time that happens. 

> *** dont need to keep .dmp files / data for long. can safely delete after
> 1-2 business days

This is probably fine. We can always attach them to a bug if we need long-term storage. There aren't any privacy issues here since they're generated from our testing infra.
Assignee: nobody → mtabara
Product: mozilla.org → Release Engineering
Quick status update on this bug, since most of the action has been happening in bug 749421.

We've had this running on Cedar for a few weeks. For example,
https://tbpl.mozilla.org/php/getParsedLog.php?id=27994107&tree=Cedar&full=1#error1

has a crash which has been uploaded here:

http://mozilla-releng-blobs.s3.amazonaws.com/blobs/cedar/sha512/e8f39971caf5fc13c31a3d6b764d740d50f1ad2fa5902d2170d2f4aa1c6d5829bdba163d5e9680014d36010e0946fcbbcbeeda15517aa59eecb610c6e0a833e0

We're hoping to have this deployed to other branches over the next week or so.
(In reply to Chris AtLee [:catlee] from comment #6)
> Quick status update on this bug, since most of the action has been happening
> in bug 749421.
> 
> We've had this running on Cedar for a few weeks. For example,
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=27994107&tree=Cedar&full=1#error1
> 
> has a crash which has been uploaded here:
> 
> http://mozilla-releng-blobs.s3.amazonaws.com/blobs/cedar/sha512/
> e8f39971caf5fc13c31a3d6b764d740d50f1ad2fa5902d2170d2f4aa1c6d5829bdba163d5e968
> 0014d36010e0946fcbbcbeeda15517aa59eecb610c6e0a833e0
per quick chat w/Taras, these are working great for him. W00t!


> We're hoping to have this deployed to other branches over the next week or
> so.
Cool. 


After we deploy to other branches, anything else left to do here before closing with resounding FIXED? :-)
Nope, this is perfect on Cedar, so it's FIXED as soon as it hits the rest of the branches.
Assigning to catlee for now.
Assignee: tabara.mihai → catlee
I think this is done now that we've got blobber on all branches.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.