Closed Bug 914253 Opened 11 years ago Closed 10 years ago

rewrite fix-linux-stack.pl in python

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: k0scist, Assigned: dbaron)

References

Details

See https://bugzilla.mozilla.org/show_bug.cgi?id=746243#c52

In order to consolidate our infrastructure around a common language,
python, it'd be nice to rewrite fix-linux-stack.pl in python.
I posted a hacky version of this in bug 948718 if anybody wants to look into it.  Not that my script is necessarily where you want to start.
I'm going to try improving on that, since the external debuginfo bits need updating to support .note.gnu.build-id as described in https://sourceware.org/gdb/current/onlinedocs/gdb/Separate-Debug-Files.html and https://sourceware.org/binutils/docs/ld/Options.html#Options .
Not sure how much effort it's worth, but it would be nice if we could share some logic b/w the osx and linux versions.
So I started from mccr8's version and added the feature that I needed (the new way of doing external debuginfo, at so that I can use dbgsym packages on Ubuntu 14.04 with it), and just landed what I have so far:

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/cfb6de8ff92c
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/9ef4e1a017e9
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/0f99aee39def
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/1acb2c5d135c
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/89a0fa83c6d7
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/e8b07667f3cc

I agree there could probably be some better sharing of the main loop and the buffering, although the rest of the code differs between platforms.  (I also realize, in hindsight, that I hooked up the address adjustment stuff differently from Mac, since mccr8 had removed that bit from his Linux version.)


There's definitely some cleanup, review, and cross-system testing to add here, but now there's something in the tree (and possibly something that works better for developers on current Linux distros than the perl version, too).
Keywords: leave-open
One bug of note: the python version is not as good:

__restore_rt (??:?)

as the perl version was:

__restore_rt (sigaction.c:?)

at dealing with __retore_rt in signal handlers.  Possibly because it insists on file *and* line rather than taking what it can get.
Assignee: nobody → dbaron
I compared the output from a DMD run when put through the Perl and Python versions. Two kinds of differences dominate:

> Perl:    __clone (/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113) 0xd9e2138d
> Python:  clone (/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113) 0xd9e2138d

> Perl:    ??? (/usr/lib/x86_64-linux-gnu/gstreamer-0.10/libgstaudioparsers.so) 0x90f092a4
> Python:  ?? (??:0) 0x90f092a4

The latter difference is more significant, though still not a big problem.

The time to run was similar; 1m18s for Perl, 1m22 for Python, though I was compiling at the same time in another window so the results may be imprecise.
(In reply to Nicholas Nethercote [:njn] from comment #6)
> I compared the output from a DMD run when put through the Perl and Python
> versions. Two kinds of differences dominate:
> 
> > Perl:    __clone (/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113) 0xd9e2138d
> > Python:  clone (/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113) 0xd9e2138d

I'm not sure what would cause this difference.  Although I remember there being something in the Mac OS X version to trim extra __s, I don't see anything in the Linux version.  Is it possible you have two separate external debuginfo sources for glibc, and the patch switched between the two?

> > Perl:    ??? (/usr/lib/x86_64-linux-gnu/gstreamer-0.10/libgstaudioparsers.so) 0x90f092a4
> > Python:  ?? (??:0) 0x90f092a4
> 
> The latter difference is more significant, though still not a big problem.

What does the input line for this look like?
Flags: needinfo?(n.nethercote)
Actually, never mind.  The perl version had code to explicitly ignore ??:0 from addr2line; we need that in the python version.
Flags: needinfo?(n.nethercote)
So, I think the big reason for differences is a bug in the perl version that led its first two checks for an external debug file to fail all the time because it was missing a "/" between the directory name and the filename.  This matters because the debuglink filename might be the *same* as the original filename.  So if there's a debuglink whose name is the same as the file, the python version is preferring the file itself over external debuginfo in the last two locations to check.  That doesn't seem right, though it is following the documentation at https://sourceware.org/gdb/current/onlinedocs/gdb/Separate-Debug-Files.html .
(And I think the best way to fix that is to implement the CRC checking.)
With the latest changes, the Python version is now better than the Perl version, because of improvements like these (from the diff)

> -    g_slice_alloc (??:?) 0xd1677707
> -    gst_registry_get_feature_list_cookie (??:?) 0x9b788148
> -    gst_registry_binary_read_cache (??:?) 0x9b7ab278
> -    gst_update_registry (??:?) 0x9b7847ed
> +    g_slice_alloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0) 0xd1677707
> +    gst_registry_get_feature_list_cookie (/usr/lib/x86_64-linux-gnu/libgstreamer-0.10.so.0) 0x9b788148
> +    gst_registry_binary_read_cache (/usr/lib/x86_64-linux-gnu/libgstreamer-0.10.so.0) 0x9b7ab278
> +    gst_update_registry (/usr/lib/x86_64-linux-gnu/libgstreamer-0.10.so.0) 0x9b7847ed
(In reply to Nicholas Nethercote [:njn] from comment #13)
> With the latest changes, the Python version is now better than the Perl
> version, because of improvements like these (from the diff)

Yeah, at some point addr2line changed to output "??:?" instead of "??:0", so I made the python version handle either in https://hg.mozilla.org/integration/mozilla-inbound/rev/ed2a7702b5e1
Anybody have thoughts on whether I should mark this bug fixed?

The new version should perhaps be code-reviewed, although I don't think the old one ever was, and NPOTB code doesn't strictly require it.

There's also the question of switching automation to use the new one, which could perhaps be done in bug 948718 although could perhaps happen sooner by just invoking the script.
Keywords: leave-open
> Anybody have thoughts on whether I should mark this bug fixed?

Sounds ok to me. We still need to remove fix-linux-stack.pl, but this bug has had enough patches land that it's probably best to do that in a follow-up bug.
I filed bug 1050601 for removing fix-linux-stack.pl.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #17)
> The new version should perhaps be code-reviewed, although I don't think the
> old one ever was, and NPOTB code doesn't strictly require it.

I can take a look (or just fix things up and get review for them).

> 
> There's also the question of switching automation to use the new one, which
> could perhaps be done in bug 948718 although could perhaps happen sooner by
> just invoking the script.

I'll do this as well, I'm in the process of reviving the patches to port runreftest.py to mozbase, so I'll need to use this.
You need to log in before you can comment on or make changes to this bug.