Closed Bug 561235 Opened 15 years ago Closed 12 years ago

Make Talos use mozcrash for minidump processing

Categories

(Testing :: Talos, defect, P4)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Unassigned)

References

Details

(Whiteboard: [talos][buildfaster:p2][mozbase])

Attachments

(2 obsolete files)

We could save a lot of time and bandwidth if we only download symbols on crash. This would require changes to talos, so it could be passed a URL to download and unpack.
Summary: Only download symbols on a crash → Talos - only download symbols on a crash
Still needed if the symbol files are 2.0MB ?
Symbols for OSX are still 15MB. Not as critical to do right away, but would be nice to do soon.
Priority: -- → P4
We should make use of the cgi that's going to be set up to handle this.
Depends on: 561754
Blocks: 561754
No longer depends on: 561754
I'm going to write some Python code in bug 563745 to do this, so we can probably just copy and paste it to Talos for now.
I wrote a test script: http://hg.mozilla.org/users/tmielczarek_mozilla.com/minidump-stackwalk-cgi/file/9c6291a68500/testsubmit.py and wound up using that code almost verbatim in the unittest harnesses.
Blocks: 661585
What would it be required to make this happen? My comprehension is limited on this area.
I think we should go with the approach I described in comment 4 and comment 5. I wrote a CGI that accepts a minidump + a URL to symbols, and produces a stack trace. This way, the slaves don't have to download anything, only the server does (and it can cache symbol so it only has to download them once).
(In reply to comment #7) > I think we should go with the approach I described in comment 4 and comment > 5. I wrote a CGI that accepts a minidump + a URL to symbols, and produces a > stack trace. This way, the slaves don't have to download anything, only the > server does (and it can cache symbol so it only has to download them once). I agree that's a fine solution. What are the steps to roll this into production?
(In reply to comment #8) > (In reply to comment #7) > > I think we should go with the approach I described in comment 4 and comment > > 5. I wrote a CGI that accepts a minidump + a URL to symbols, and produces a > > stack trace. This way, the slaves don't have to download anything, only the > > server does (and it can cache symbol so it only has to download them once). > > I agree that's a fine solution. What are the steps to roll this into > production? 1) need a host to put said CGI on 2) test harnesses need to know how to interact with the CGI 3) change buildbot code to stop downloading/unpacking of symbols and instead pass symbol URL and CGI URL to test harnesses 4) ??? 5) profit!
2 is fixed for unittests, but not for Talos. Should be easy enough to port the automation.py code to Talos.
(In reply to comment #10) > 2 is fixed for unittests, but not for Talos. Should be easy enough to port > the automation.py code to Talos. I wonder about failover behaviour too here. Should we retry, or specify an alternate server to talk to, or just accept that sometimes the CGI won't be available?
If we were to fix bug 642167, it might not be a big deal if the CGI doesn't respond.
Whiteboard: [talos] → [talos][buildfaster:p2]
Component: Release Engineering → Talos
Product: mozilla.org → Testing
QA Contact: release → talos
Summary: Talos - only download symbols on a crash → Talos - send minidumps to stackwalk cgi for processing
Version: other → unspecified
Porting the unittest code to Talos would just involve reusing the code here: http://mxr.mozilla.org/mozilla-central/source/build/automationutils.py#103
Assignee: nobody → wlachance
So here's a first cut at making talos able to use the cgi server for parsing crashdumps. I copied over the code from automationutils.py into a seperate "crashhandler.py" module inside Talos. The idea here is that we might eventually want to factor crash parsing out into MozBase, and it'll be easier to do that if we know that we're using virtually the same code in automationutils.py and talos. The behaviour for choosing a minidump crash parser is slightly different between automationutils.py and I opted to go with the former's behaviour. If the user wants to do local minidump crash parsing, they'll need to set the MINIDUMP_STACKWALK environment variable to a path to a minidump crash parser. Before, talos would try to guess what platform the user was on and set the minidump parser to the appropriate file checked into talos inside the breakpad subdirectory. I haven't really though enough about this to know which approach is really "better", so I opted for that of automationutils.py because I guessed it was touched most recently. I may have made the wrong call.
Attachment #552246 - Flags: review?(ted.mielczarek)
Comment on attachment 552246 [details] [diff] [review] Add support to talos for use of CGI crashhandler I'm not a Talos peer, so you'll probably want Alice to review this. (Also I wrote all the code you copied there, so it would be a bit inappropriate for me to review it!) Just one note, you're using the poster lib here, you'll need to hg add poster.zip as well.
Attachment #552246 - Flags: review?(ted.mielczarek) → review?(anodelman)
This adds poster.zip, required on systems without this package installed
Attachment #552246 - Attachment is obsolete: true
Attachment #552372 - Flags: review?(anodelman)
Attachment #552246 - Flags: review?(anodelman)
Apparently we're planning to take a different approach to this due to load issues on the buildmaster (Bug 679759). I'll wait til' that cooks, then probably adapt it to Talos. Can hold off on reviews until then.
Being sick and falling behind on reviews pays off! Can you remove the review flag until you are ready to go? Otherwise it will keep showing up in my queue.
Comment on attachment 552372 [details] [diff] [review] Add support to talos for use of CGI crashhandler (take 2) Unassigning anode as reviewer
Attachment #552372 - Flags: review?(anodelman)
Summary: Talos - send minidumps to stackwalk cgi for processing → Talos - download symbols on crash as required
Attachment #552372 - Attachment is obsolete: true
Yeah, we should implement the same approach used in bug 679759 for Talos.
any progress here?
AFAIK, no one is actively working on this. If it is a high priority, we should probably figure out someone.
(In reply to Jeff Hammel [:jhammel] from comment #22) > AFAIK, no one is actively working on this. If it is a high priority, we > should probably figure out someone. If it's not super high priority (and I'm guessing it isn't if we managed to get this far without doing it), it would make a nice first bug for someone.
It would reduce the run time for each job that does not crash since the 1) download 2) unzip and 3) remove steps for symbols would not be executed. In other words, it has value to increase slightly our capacity.
Any progress? This is blocking some work in RelEng that would reduce load (see bug#561754 for details).
I am not currently working on this, remove myself to correct that impression.
Assignee: wlachance → nobody
Do we have a preferred solution for this?
Yeah, this is implemented in mozcrash. We should just use that.
Summary: Talos - download symbols on crash as required → Make Talos use mozcrash for minidump processing
Whiteboard: [talos][buildfaster:p2] → [talos][buildfaster:p2][mozbase]
(In reply to Ted Mielczarek [:ted] from comment #29) > Dup of bug 675688 now? I think bug 675688 should be closed and this bug kept open for the talos side of things.
Depends on: 813132
Fixed in bug 824984.
Status: NEW → RESOLVED
Closed: 12 years ago
Depends on: 824984
Resolution: --- → FIXED
Depends on: 819038
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: