Last Comment Bug 216827 - implement an open-sourced crash reporting tool to replace talkback.
: implement an open-sourced crash reporting tool to replace talkback.
Status: VERIFIED FIXED
: access, sec508
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: Trunk
: All All
: -- normal with 6 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
http://code.google.com/p/google-break...
: 118994 (view as bug list)
Depends on: 354980
Blocks: 350425
  Show dependency treegraph
 
Reported: 2003-08-20 19:08 PDT by Rick Ju
Modified: 2010-12-03 08:09 PST (History)
55 users (show)
vladimir: blocking1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch to include crashrep (326.08 KB, patch)
2006-03-17 15:35 PST, Alexander Opitz
no flags Details | Diff | Review
The crashrep client as seen in the screenshoots (38.12 KB, application/x-bzip2)
2006-03-17 15:56 PST, Alexander Opitz
no flags Details
The components part that calls the client on a crash (4.47 KB, application/x-bzip2)
2006-03-17 15:58 PST, Alexander Opitz
no flags Details

Description Rick Ju 2003-08-20 19:08:42 PDT
Why don't we get a opensourced crash reportging tool to replace the proprietary
one? we can use openoffice's or gnome's bugbuddy?
Comment 1 Christian :Biesinger (don't email me, ping me on IRC) 2003-08-23 04:25:20 PDT
At least gnome's bug-buddy requires shipping binaries with symbols, that would
increase the download size a lot. 
Comment 2 David Bradley 2003-08-23 08:12:31 PDT
Yes, you really need to do what Talkback does, and catch the raw crash data,
send it back and pair it up with the symbols stored on a server. IMO this should
be an Open Source project all its own. I'd love to work on it. Alas, my full
time contributing days are coming to a close :-(
Comment 3 Robert Pollak 2003-08-29 00:07:47 PDT
In Windows OpenOffice, i see a "Sun Microsystems, Inc. Crashreporter v.1.1" with
LGPL license mentioned.
(File THIRDPARTYLICENSEREADME.html in OOo_1.1rc3_030813_Win32Intel_install_de.zip)

Which crash reporting tool do they use on Linux?
Comment 4 Robert Pollak 2003-08-29 00:30:42 PDT
I mailed hro@openoffice.org (Hennes Rohling), who owns the fixed OOo bug
"Crashreporter doesn't work under Linux"
<http://www.openoffice.org/project/www/issues/show_bug.cgi?id=17635>.

I also asked him with whom of the OOo people we could talk about using their
tool in Mozilla.
Comment 5 Jason Clinton 2003-09-30 11:30:26 PDT
Is this the code we're looking for?
http://ooo.ximian.com/lxr/source/porting/sal/osl/unx/signal.c#282
Comment 6 David Bradley 2003-09-30 17:05:56 PDT
One issue to consider is the captured information and access to the database
containing it. Netscape's talkback database was restricted to employees, thus
this data wasn't accessible to the world. The chances of sensitive data being
captured in the dump is small, it could happen.

I'm not sure there's a good answer for this. People could easily opt out if the
so choose, but ideally you want people to use it and feel reasonable safe in
doing so. Having a Talkback like database open to the world might give cause for
more users to opt to turn it off.

Also, for Windows, you can could create a mini dump via MiniDumpWriteDump. It's
not cross platform, though. I've been able to get this working under Windows.
The catch is, dumps can be created on older OS's by supplying the dbghelp.dll,
but I think you'd need VC7 to view it. It might be possible to create a simple
app to generate a report similar to what Talkback provides.
Comment 7 Shiva Thirumazhusai 2003-10-01 05:57:46 PDT
Mozilla Foundation is setting up a seperate infrastructure for Talkback. Stay 
tuned for more details. 

Comment 8 Jay Patel [:jay] 2006-01-25 11:00:52 PST
I wish I was cc'd on this bug a long time ago, because I have been thinking about this for a long time.  I just haven't had the time or resources to look into possible alternatives.  

The Sun OOo crash reporting tool is the first thing that comes to mind... so if anyone has contacts at Sun, please find out if it would be feasible to intregrate their crash tool into other products like Firefox and Thunderbird.
Comment 9 Jay Patel [:jay] 2006-01-25 11:51:02 PST
One critical requirement for any such alternative is that is cross-platform.  We cannot afford to integrate different tools for each platform, so if we do ever decide to go with a different system, it must work on at least the three major platforms (MacOSX, Windows, and Linux).
Comment 10 Olav Vitters 2006-01-25 12:28:49 PST
GNOME would like such a thing too. Preferrably something that could deal with missing debug info (it should add the info back). This must work across distributions
Perhaps distributions provide the debug packages to a GNOME server while the client provides MD5 sums of all libraries. Then it is the servers job to figure out what debug info to load.
Comment 11 Fernando Herrera 2006-01-25 15:13:16 PST
Not only the reporting tool is needed. Also the code for building the debug server which has the debug info for the shipped binaries as well as a copy of every popular OS over there (some kind of jailed instalations). Also a good interface for accessing this info would be great
Comment 12 Alexander Opitz 2006-02-09 13:03:29 PST
I've worked already on such a solution. I never found informations how GNOME's crashreporter worked so this all is build from my own. Take a look at: http://seamonkey.itkombinat.de/crashrep/
Comment 13 Jesus Cea 2006-03-15 04:49:10 PST
Very nice, Alexander. Only six incident reports?. Is the service in production?.

Is the tool integrated really in SeaMonkey?. Are you in contact with mozilla people to integrate such a tool in firefox/thunderbird?.

Current situation is awful. I have tons of crashes and can't report/investigate them :-/
Comment 14 Henrik Skupin (:whimboo) 2006-03-15 08:17:27 PST
Jay, please take a look at comment 12. That's what you are waiting for? ;)
Comment 15 Alexander Opitz 2006-03-15 09:22:57 PST
@Jesus Cea

It isn't in production it is only enabled in my own build. But I've to much work and I'm to often ill atm ... I hope I can bring it forward.
Comment 16 Jesus Cea 2006-03-15 10:03:34 PST
Alexander, if I can help. I'm a busy person, but this issue is a priority for me.
Comment 17 Alexander Opitz 2006-03-16 06:11:16 PST
@Jesus Cea

If you have Linux I can give you my latest build and there you can try the crashes.
Comment 18 Alexander Opitz 2006-03-17 15:35:42 PST
Created attachment 215450 [details] [diff] [review]
Patch to include crashrep
Comment 19 Alexander Opitz 2006-03-17 15:56:34 PST
Created attachment 215452 [details]
The crashrep client as seen in the screenshoots
Comment 20 Alexander Opitz 2006-03-17 15:58:09 PST
Created attachment 215454 [details]
The components part that calls the client on a crash
Comment 21 Alexander Opitz 2006-03-17 16:03:55 PST
This is the source that's needed by mozilla for linux crash reporter.
Please excuse that the component didn't look very clean on memory handling but my C knowledge is less. Also I don't know what is the state of a application after crashing. That's why it looks so ugly.
Comment 22 Mark Mentovai 2006-03-20 21:22:52 PST
I've been working on this too.  I'm starting with the win32-specific pieces, and am working on a portable win32 digester that can walk dump stacks on any platform.  My goal is to handle crash reporting for all three major platforms, with portable digester code for each.
Comment 23 Robert Kaiser (not working on stability any more) 2006-03-21 03:44:17 PST
Mark: Sounds good - Alexander has major parts for the Linux client and the server side, you're working on a Windows solution, both have a cross-platform tool as a target...
Would it be possible to merge your approach into the work already attached here?

We should probably get the current stuff into the CVS tree (not built by default for now) and work on improving it from there.

Especially the parts of the patch that are not crashrep-specific files would need review though, I guess. Of course, review would also be nice for the crashrep stuff itself.
Comment 24 Fernando Herrera 2006-03-22 00:26:51 PST
Hi, Alexander nice work. But using backtrace gives us interesting data only when debug info is present on the system. That will work only with people compiling the software themself and people with *-debuginfo packages installed. So to get interesting backtraces from the most common user case we have two options:

a) Detetect if no dcebuginfo is available (some little ELF magic) and ask the user to install the debuginfo package before getting the backtrace.

b) Include in the crash report the exact version of every mapped code (md5sum of binaries and libs) and re-create the backtrace on a master server with every know binary installed.

The problem I see with a) is that there is no standard way across multiple distributions to ask a system package to be installed. Also I don't know it glibc backtrace is smart enough to get debuginfo from /var/debuginfo/* (I know that gdb is).

With b) we would need some kind of recreate-backtrace software and some way of install every binary from a distro and create a mapping md5sum-->binary. Of course also a dedicated server is needed for each architecture.

What do you think about this?
Comment 25 James Ross 2006-03-22 00:34:40 PST
If it is planned to replace the existing Talkback setup, the client portion absolutely must be able to get the info it needs *without* any debug symbols, and send the data to a server which can peice the data together into a real stack.
Comment 26 Benjamin Smedberg [:bsmedberg] 2006-03-22 05:03:58 PST
It should be simple enough to use the stackwalking code in nsStackFrameUnix to get the actual stack frame addresses, and then convert those to symbolic information on the server side.
Comment 27 Mark Mentovai 2006-03-22 07:02:41 PST
Where feasible, I'm actually hoping to send stacks for all threads up to the server.  The server will hold the symbols and will have a way of mapping builds to a set of image files.  Talkback uses build IDs for this purpose, we can do the same.

I'm also toying with the idea of doing some symbol-matching on the client side, to do a better job of presenting stack frames in system code without requiring the server to know too much about system libraries.

Everything will be done portably: there's no reason the servers will need to be tied to a specific architecture.  Pulling useful data out of the pdb files is the last question mark on my win32 punchlist.
Comment 28 Mark Mentovai 2006-03-22 07:10:14 PST
Funny how after you say something aloud, you get to thinking about it and come up with a solution.  .dbg files are more than enough for symbol-mapping, contain information in COFF format, and aren't likely to incompatibly change between MSVC releases like .pdb files.
Comment 29 timeless 2006-03-22 07:48:49 PST
yeah, but we're hoping to let the symbol servers do more than just get stacks eventually. if only simply make the pdbs available to people for debugging, although that could be handled differently as long as you can generate both dbg and pdb which i believe you can....

one plea. please please please don't use build ids. use the approach symstore uses, namely a very specific hash of each dll. that way if someone mixes dlls, or perhaps installs an extension (xforms, domi) from a different build you can still get correct stacks. this also enables us to deal w/ things for which we don't have symbols today, but perhaps someone else does (e.g. a plugin vendor).
Comment 30 Fernando Herrera 2006-03-22 07:51:37 PST
Also we may be interested on crashed happening on system libs code, like a wrong call to a gtk function and so.
Comment 31 timeless 2006-03-22 07:59:37 PST
get all the linux vendors to supply a standard symbolserver system like microsoft does and we'll talk. until then, that's a problem that's probably going to be too hard for us to solve.

linux users like building their own libraries, and often w/ slightly different options and w/o symbols, which means that you'll never find another box anywhere in the world that actually can answer symbol questions about the system library.

linux distros also like sticking things in very random places, i'd be very surprised if there was a standard location for symbol files that could be found by a disstressed application (one that has already crashed, possibly crashed because the libraries it needed were on a network file system which has gone away). certainly trying to get function names for "system" libraries before sending the information to the server would be good. but i don't see any useful proposals for how one can do that.
Comment 32 Robert Kaiser (not working on stability any more) 2006-03-22 08:04:07 PST
We need to care though that the server side can meet the needed storage requirements. Storing all .pdb files permanently can be quite costly on the storage side over time, from what I heard (at least if we plan to use the same server for all nightlies of a bunch of different applications like Firefox, Thunderbird, XULRunner, SeaMonkey, Camino, etc.).
Comment 33 timeless 2006-03-22 08:28:51 PST
symbol storage can be distributed. symstore/symsrv rely on the concept of multiple stores, each of which has s database that maps a requested dll-hash to a symbol file.

note that there's nothing wrong w/ the symbol server knowing which pdbs belong to a given build, that's fairly standard for synstore. i just don't want the client to remember and try to suggest to the server that it use a build id as a lookup key. if two dlls have the same hash, then you can actually *save* space by only keeping a single copy of its pdb instead of one for each and every build where that library was the same.
Comment 34 Alexander Opitz 2006-03-22 08:42:32 PST
@Fernando Herrera from comment #24

I'm using way b)

After a crash the client sends the trace and the memory map (taken from /proc/self/map) with a build id to the server ( http://seamonkey.itkombinat.de/crashrep ) and is examined there. It is send via POST command and not uploaded as file.

@timeless

Yeah, I wan't to replace the build ID with a md5 hash so we see if someone changed so files. But thats for later I think.


I know nothing about .pdb and so on MS Windows systems, but I won't depend on MSVC or else. Maybe we want to switch to gcc, Watcom (who knows).
Comment 35 Mark Mentovai 2006-03-22 08:46:32 PST
Official win32 builds are produced with MSVC.  As such, that's all I really care about for the purposes of crash reporting on that platform for the time being.
Comment 36 Mark Mentovai 2006-03-22 08:50:22 PST
I don't see any problems with using a hash to identify libraries, but I'd want the client to at least specify a build ID as a backup hint.  There are cases where the hash of a client-side library might change: consider prebinding on OS X.
Comment 37 Christian :Biesinger (don't email me, ping me on IRC) 2006-03-22 09:00:16 PST
> Yeah, I wan't to replace the build ID with a md5 hash so we see if someone
> changed so files. But thats for later I think.

It would be nice to avoid changes to the protocol.
Comment 38 Benjamin Smedberg [:bsmedberg] 2006-03-22 09:10:39 PST
I'm going to be changing the buildid stuff on trunk to separate out various uses of buildids. For our purposes the buildID used by crashrep and update should be a long identifier such as "gaius-trunk-2006032212-en-US" that identifies the precise bits being staged. The buildID will be specified by the tinderbox configuration (thus debug homebrew builds won't have this kind of unique ID).

Don't let my plans get in the way of using NS_BUILD_ID or dllhashes for the time being until I can get my act in gear.
Comment 39 Alexander Opitz 2006-03-22 09:15:43 PST
@biesi

Yet we don't have a protkoll, and I excuse, didn't mean replace ... I meant extend. So we only add one more parameter.

But before I've anything for win32 I won't call it Protokoll.

@Mark

Is it possible to get the source for the walker?
On Linux I didn't implement a walker I used the gtk(?) function for that. So maybe the Linux version needs an own walker. I don't like to depend on /proc for the memory mapping.

I also won't use minidump as it seems only available from WinXP upwards.
Comment 40 Alexander Opitz 2006-03-22 09:30:07 PST
Ah I forgot to upload the tinderbox patch and as I speak of buildID ...

the build isn't identified by buildID it is identified by a produktID.

At the moment following is done by the tinderbox script:

1) Build a debug enabled build.
2) Connect via ssh to the crashrep Server and login to your account.
3) Start a script 'symbols/addnewbuild.php $vendor $name $type $extra $buildID $machine' that returns you a new produktID
4) the tinderbox script copies all binarys in a directory and strips out all what isn't needed for debug processing. (like talkback seems to do)
5) The files are packaged into crashrep-$productID.tar.bz2
6) Writing productID into crashrep.ini
7) Files are uploaded to server into ~/symbols
8) striping out debug symbols from origional (like normal in tinderbox script)
9) packaging like normal in tinderbox script
Comment 41 Mark Mentovai 2006-03-22 10:02:43 PST
Avoid topic creep.  File new bugs early, and file often.

I opened bug 331357 to track the win32 stuff I'm working on.
Comment 42 Aaron Leventhal 2006-09-06 07:24:16 PDT
The current talkback client is not accessible on Linux. We need a new one based on an accessible widget set like GTK.
Comment 43 Reed Loden [:reed] (use needinfo?) 2006-09-06 07:30:03 PDT
I really doubt Jay can work on a new Talkback client, "thanks" to the NDA tied to the old Talkback code.
Comment 44 Benjamin Smedberg [:bsmedberg] 2006-09-06 07:30:20 PDT
This is now being worked on by the airbag project: http://code.google.com/p/airbag/
Comment 45 Jay Patel [:jay] 2006-09-06 18:17:45 PDT
(In reply to comment #43)
> I really doubt Jay can work on a new Talkback client, "thanks" to the NDA tied
> to the old Talkback code.
> 

Yes, I don't plan on working on any new version of Talkback, since I haven't seen the code and don't on looking anytime soon.  All Talkback related bugs in Bugzilla are default assigned to me, since I'm the only person maintaining the servers at MoCo... and many will probably never be fixed... that's what airbag is for! :-)
Comment 46 Jean-Marc Desperrier 2006-09-07 04:12:15 PDT
(In reply to comment #44)
> This is now being worked on by the airbag project:
> http://code.google.com/p/airbag/

Should we mutate this bug into getting airbag integrated to mozilla (once it's usable), or then close it and open a new one ?
Comment 47 Vladimir Vukicevic [:vlad] [:vladv] 2006-10-12 15:55:34 PDT

*** This bug has been marked as a duplicate of 354980 ***
Comment 48 Ginn Chen 2007-01-08 23:05:08 PST
*** Bug 118994 has been marked as a duplicate of this bug. ***
Comment 50 Robert Kaiser (not working on stability any more) 2007-06-08 13:51:23 PDT
Should this stay open and assigned to nobody, or actually go FIXED due to breakpad being available or at least get some assignee that marks it when breakpad is fully deployed?
Comment 51 Benjamin Smedberg [:bsmedberg] 2007-06-08 14:06:02 PDT
Let's call this FIXED! woot
Comment 52 Worcester12345 2007-06-09 19:38:29 PDT
Is this now shipping on nightly trunk Seamonkey builds? If so, I am not seeing it.
Comment 53 Robert Kaiser (not working on stability any more) 2007-06-10 03:26:43 PDT
This bug is only about _implementing_ such a tool at all. It is currently shipped on Mac and Windows for Firefox (Linux will come soon), and it's been tested well enough there, it will be shipped with other applications.

Track bug 383125 for that tool being shipped with SeaMonkey trunk.

Note You need to log in before you can comment on or make changes to this bug.